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ABSTRACT 



A method and system that operates as a background process 
automatically identify and merge duplicate files into a single 
instance files, wherein the duplicate files become indepen- 
dent links to the single instance files. A groveler maintains 
a database of information about the files on a volume, 
including a file size and checksum (signature) based on the 
file contents. The groveler periodically acts in the back- 
ground to scan the USN log, a log that dynamically records 
file system activity. New or modified files detected in the 
USN log are queued as work items, each work item repre- 
senting a file. The volume may be scanned to add work items 
to the queue, which takes place initially or when there is a 
potential problem with the USN log. The groveler periodi- 
cally removes items from the queue, calculates the signature 
of the corresponding file contents, and uses the signature and 
file size to query the database for matching files. The 
groveler then compares any matching files with the file 
corresponding to the work item for an exact duplicate, and 
if found, calls a single instance store facility to merge the 
files and create independent links to those files. 
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METHOD AND SYSTEM FOR data, having one or more logically separate links thereto 

AUTOMATICALLY MERGING FILES INTO A representing the original files. A groveler facility maintains 

SINGLE INSTANCE STORE a database of information about the files on a volume of a file 

system, the information for each file including a file size and 

TECHNICAL FIELD 5 checksum (signature) based on the file contents. The grov- 

m „ , eler includes a component that periodically acts in the 

Tbe invention relates generally to computer systems and background , 0 xan ^ USN i og , a i og Uia, dynamically 

dau storage, and more particularly to identifying and merg- records fik tem activi , wbereby Dew or modified files 

mg files of a file system having common properties. detected in the USN log are queued as work items, each 

BACKGROUND OF THE INVENTION 10 wo * *?" "P""*"* 8 ^ * 

scanned to add work items to the queue, which takes place 

The contents of a file of a file system may be identical to initially when the queue is created, or when there is a 

the contents stored in one or more other files. While some potential problem with the USN log. 

file duplication tends to occur on even an individual user's The groveler includes another component that periodi- 

personal computer, duplication is particularly prevalent on is cally removes items from the queue, calculates the signature 

networks set up with a server that centrally stores the of the corresponding file con tents, and uses the signature and 

contents of multiple personal computers. For example, with file size to query the database for matching files. The 

a remote boot facility on a computer network, each user groveler component then compares any matching files with 

boots from mat user's private directory on a file server. Each the file corresponding to the work item for an exact 

private directory thus ordinarily includes a number of files 20 duplicate, and if found, calls a single instance store facility 

that are identical to files on other users' directories. As can to merge the files and create independent links to those files, 

be readily appreciated, storing the private directories on other advantages will become apparent from the follow- 

traditional file systems consumes a great deal of disk and ^ deta iied description when taken in conjunction with the 

server file buffer cache space. drawings, in which: 

Techniques that have been used to reduce the amount of 25 

used storage space include linked-file or shared memory BRIEF DESCRIPTION OF THE DRAWINGS 

techniques, essentially storing the data only once. However, * . . , , 

1 . . J . * FIG. 1 is a block diagram representing a computer system 

when these techniques are used in a file system, the files are . . u - . 4 . . . £ . t i 

... , 7 „ , fl ' , r into which the present invention may be incorporated; 

not treated as logically separate files. For example, if one „™ * J . 

iisermakesachangetoaliiiked-file,orifthecontentsof the 30 nGS - ^7 are. block diagrams representing various 

shared memory change, every other user linked to that file components lor working with single instance store (SIS) hnk 

sees the change. This is a significant drawback in a dynamic mes and SIS 00111111011 store ffles > including the automatic 

environment where files do change, even if not very fre- Certifying and merging of duplicate files in accordance with 

quently. For example, in many enterprises, different users an ^P^ 1 of tbe P resenl ^vention; 

need to maintain different versions of files at different times, 35 FIG. 3 is block diagram representing various components 

including traditionally read-only files such as applications. of a groveler for automatically identifying duplicate files for 

As a result, linked-file techniques would work well for files merging, in accordance with an aspect of the present inven- 

that are strictly read-only, but these techniques fail to tion; 

provide the flexibility needed in a dynamic environment. FIG. 4 is block diagram representing various components 

Another problem with these techniques is that identifying 40 connected to a groveler worker object of FIG. 3; 
identical files becomes a complex task as tbe number of files FIGS. 5 and 6 comprise a flow diagram generally repre- 
on a file system volume increases. For example, a disk drive seating the steps taken to call functions of a groveler worker 
may store thousands of files, and each time a new file is to automatically identify and merge duplicate files in accor- 
written to a disk or a file is changed, a potential for file ^ dance with one aspect of tbe present invention; 
duplication exists. At times a user may know when files are n G . 7 is a flow diagram generally representing the steps 
duplicates of one another, and thus can manually request that takcn by lbe groveler worker open function; 
the file data be shared, however relying on a user to delect flow ^ n representing the steps 
such conations is unpredictable and for large numbers of ^ b ^ fcr extract j ^nato* 
files, inefficient and/or unpractical. One possible solution is ,™ „ - „ .. , 
to run a utility at system start-up that scans a file system's 50 1 FIG * 9 f flow S eneraUv W*?™** the steps 
files for duplicates, however this solution becomes unac- ***** bv tbe & roveler worker Inchon; 
ceptably slow even with only a few thousand documents. ^G. 10 is block diagram representing various compo- 
Moreover, such a solution would not work well for users ncnts of a SIS file 311(1 SIS common store file; 
who seldom reboot a machine. Indeed, as more and more FIGS. 11A-11B comprise a flow diagram generally rep- 
disk space is consumed, sharing files becomes a more 55 resenting the steps taken to merge duplicate files into a SIS 
valuable tool for preserving disk space, and thus a real-time common store file; 

solution could reclaim space when most needed. However, FIG. 12 is a representation of a SIS link file open request 

scanning even a relatively modest number of files in a file passing through a preferred SIS and file system architecture; 

system volume for one or more duplicates, such as each time ^ FIGS. 13A and 13B comprise a flow diagram generally 

that a file is closed, consumes a great deal of time and representing the steps taken by the SIS facility to handle the 

machine resources, and thus is also impractical. open reques t represented in FIG. 12; 

FIG. 14 is a representation of a SIS link file write request 
passing through a preferred SIS facility; 

Briefly, the present invention provides a method and 65 FIG. 15 is a flow diagram generally representing the steps 

system for automatically identifying common files of a file taken by the SIS facility to handle the write request repre- 

system and merging those files into a single instance of tbe sented in FIG. 14; 
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FIG. 16 is a representation of a SIS link file read request data that are accessible by a computer, such as magnetic 

passing through a preferred SIS facility; cassettes, flash memory cards, digital video disks, Bernoulli 

FIG. 17 is a flow diagram generally representing the steps cartridges, random access memories (RAMs), read-only 

taken by the SIS facility to handle the read request repre- memories (ROMs) and the like may also be used in the 

sented in FIG. 16; 5 exemplary operating environment. 

FIG. 18 is a flow diagram generally representing the steps A number of program modules may be stored on the hard 

taken by the SIS facility to handle a SIS link file close disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, 

request; and including an operating system 35 (preferably Windows® 

HG. 19 is a flow diagram generally representing the steps ^ computer 20 includes a file system 36 associated 

taken by the SIS facility to handle a SIS link file delete 10 ^tb or .nc^wthm the ope^g system 35, such as tbe 

u t Windows NT® File System (NTFS), one or more applica- 

tion programs 37, other program modules 38 and program 

DETAILED DESCRIPTION OF THE data 39. A user may enter commands and information into 

INVENTION the personal computer 20 through input devices such as a 

Exemplary Operating Environment is keyboard 40 and pointing device 42. Other input devices 

FIG. 1 and the following discussion are intended to (not shown) may include a microphone, joystick, game pad, 

provide a brief general description of a suitable computing satellite dish, scanner or the like. These and other input 

environment in which the invention may be implemented. devices are often connected to the processing unit 21 

Although not required, the invention will be described in the through a serial port interface 46 that is coupled to the 

general context of computer-executable instructions, such as 20 system bus, but may be connected by other interfaces, such 

program modules, being executed by a personal computer. as a parallel port, game port or universal serial bus (USB). 

Generally, program modules include routines, programs, A monitor 47 or other type of display device is also 

objects, components, data structures and the like that per- connected to the system bus 23 via an interface, such as a 

form particular tasks or implement particular abstract data video adapter 48. In addition to the monitor 47, personal 

types. Moreover, those skilled in the art will appreciate that 25 computers typically include other peripheral output devices 

the invention may be practiced with other computer system (not shown), such as speakers and printers, 

configurations, including hand- held devices, multi- The personal computer 20 may operate in a networked 

processor systems, microprocessor-based or programmable environment using logical connections to one or more 

consumer electronics, network PCs, minicomputers, main- remote computers 49. The remote computer (or computers) 

frame computers and the like. The invention may also be 30 49 may be another personal computer, a server, a router, a 

practiced in distributed computing environments where network PC, a peer device or other common network node, 

tasks are performed by remote processing devices that are and typically includes many or all of the elements described 

linked through a communications network. In a distributed above relative to the personal computer 20, although only a 

computing environment, program modules may be located memory storage device 50 has been illustrated in FIG. 1. The 

in both local and remote memory storage devices. 35 logical connections depicted in FIG. 1 include a local area 

With reference to FIG. 1, an exemplary system for imple- network (LAN) 51 and a wide area network (WAN) 52. Such 

menting the invention includes a general purpose computing networking environments are commonplace in offices, 

device in the form of a conventional personal computer 20 enterprise-wide computer networks, Intranets and the Inter- 

or the like, including a processing unit 21, a system memory net. 

22, and a system bus 23 that couples various system com- 40 When used in a LAN networking environment, the per- 
po nents including the system memory to the processing unit sonal computer 20 is connected to the local network 51 
21. The system bus 23 may be any of several types of bus through a network interface or adapter 53. When used in a 
structures including a memory bus or memory controller, a WAN networking environment, the personal computer 20 
peripheral bus, and a local bus using any of a variety of bus typically includes a modem 54 or other means for establish- 
architectures. The system memory includes read-only 45 ing communications over the wide area network 52, such as 
memory (ROM) 24 and random access memory (RAM) 25. the Internet. The modem 54, which may be internal or 
A basic input/output system 26 (BIOS), containing the basic external, is connected to the system bus 23 via the serial port 
routines that help to transfer information between elements interface 46. In a networked environment, program modules 
within the personal computer 20, such as during start-up, is depicted relative to the personal computer 20, or portions 
stored in ROM 24. The personal computer 20 may further 50 thereof, may be stored in the remote memory storage device, 
include a hard disk drive 27 for reading from and writing to It will be appreciated that the network connections shown 
a hard disk, not shown, a magnetic disk drive 28 for reading are exemplary and other means of establishing a communi- 
from or writing to a removable magnetic disk 29, and an cations link between the computers may be used, 
optical disk drive 30 for reading from or writing to a The present invention is described herein with reference 
removable optical disk 31 such as a CD-ROM, DVD-ROM 55 to Microsoft Corporation's Windows 2000 (formerly Win- 
er other optical media. The hard disk drive 27, magnetic disk dows NT®) operating system, and in particular to the 
drive 28, and optical disk drive 30 are connected to the Windows NT® file system (NTFS). Notwithstanding, there 
system bus 23 by a hard disk drive interface 32, a magnetic is no intention to limit the present invention to Windows® 
disk drive interface 33, and an optical drive interface 34, 2000, Windows NT® or NTFS, but on the contrary, the 
respectively. The drives and their associated computer- 60 present invention is intended to operate with and provide 
readable media provide non-volatile storage of computer benefits with any operating system, architecture and/or file 
readable instructions, data structures, program modules and system, 
other data for the personal computer 20. Although the The Groveler 

exemplary environment described herein employs a hard Turning now to FIGS. 2A-2B, there is shown a general 

disk, a removable magnetic disk 29 and a removable optical 65 concept of a groveler 60 and a single instance store (SIS) 

disk 31, it should be appreciated by those dulled in the art facility and architecture underlying a preferred implemen- 

that other types of computer readable media that can store tation of the present invention, which may be implemented 



US 6,389,433 Bl 

5 6 

in the computer system 20. In accordance with one aspect of detection of duplicate files for merging purposes, the grov- 
tbe present invention and as described in detail below, as eler 60 includes a central controller 82, preferably imple- 
represented in FIG. 2A, in general, the groveler 60 finds files mented as an instantiated object (e.g., a C++ object) with one 
having duplicate data in a file system volume 62. Via a file or more defined interfaces thereto. The central controller 82 
system control named SIS_MERGE_F1LES 64, the grov- 5 regulates the operation of one or more partition controllers 
eler 60 calls the Single Instance Store (SIS) facility 66 to S4 cr S4 t7 (three are shown in FIG. 4), one partition control- 
merge the duplicate files into a single instance of data with ler per file system volume. In turn, the partition controllers 
links thereto. The SIS_MERGE_J 7 ILES control 64 may be 84 <r -84^ each have a groveler worker 86^6^ associated 
implemented as a Windows 2000 file system control, rec- therewith that when activated, individually attempt to iden- 
ognized by the SIS facility 66. One such (SIS) facility 66 is 10 tify duplicate files in their corresponding file system volume, 
described below, and is further described in copending The central controller 82 synchronizes the operation of 
United States Patent Application entitled "Single Instance the partition controllers 84(^-84^ across multiple volumes, 
Store for File Systems," assigned to the assignee of the for example, such that only one runs at a time depending on 
present invention, filed concurrently herewith, and hereby available system resources. In rum, when allowed to operate, 
incorporated by reference herein in its entirety. Note that 15 each partition controller 84 < *-84 £: calls functions of its 

alternatively, a user, via a SIS COPYFILE request 68 to the corresponding groveler worker 86 c -86 £ . 

SIS facility 66, may explicitly (manually) request that a As represented in FIG. 4, each groveler worker 86 is a 

source file be copied to a destination file as a SIS copy of the single process, and includes an open function 88, close 

file. function 90, extract log function 92, scan volume function 

As shown in FIGS. 2A and 2B, the groveler 60 finds, for 20 94 and grovel function 96. In general, calling the open 

example, that files 70, 72 (named DirlVXYZ and Dir2\ABQ function 88 causes the groveler worker 64 to open (or create 

have duplicate data. Note that the files may be in separate if needed) a database 100 of file information and a queue 102 

directories of the volume 62 or in the same directory. When of work items, also conveniently stored as a database. The 

such duplicate files 70, 72 are identified, the groveler 60 close function 90 takes care of any cleanup operations 

calls the SIS facility 66 via the SIS_MERGE_F1LES 25 before the groveler worker is shut down, 

control request 64. As described below and as generally The extract log function 92 uses a USN (Update Sequence 

shown in FIG. 2B, the call to the SIS facility 66 normally Number) log 104 to add items (file identifiers) to the work 

results in a single instance representation 74 of the original item queue 102. As is known, the USN log 104 is a function 

files 70, 72 with links 76, 78 thereto, each link correspond- of the underlying NTFS filesystem 130 that dynamically 

ing to one of the original files, e.g., the user sees each link 30 records changes to a file system volume's files by storing 

file as if it was the original file. The common store file 74 is change information indexed by a monotonically increasing 

maintained in a common store directory 80 of such files. sequence number, the USN. The extract log function 92 

Each SIS link file 76, 78 is a user file that is managed by reads records from the USN log 104, each time starting from 

the SIS facility 66, while the common store 80 is preferably where it previously left off (as tracked by recording the 

a file system directory that is not intended to be visible or 35 USN), filters out those records that do not deal with new or 

accessible to users. Note that the single instance represen- modified files, and adds items to the work item queue 102 

tation 74 need not actually be a file system file in a file that correspond to new or modified files, 

system directory, but may be stored in some other data Calling the scan volume function 94 places work items 

structure. Thus, as used herein, the link file, common store (file identifiers corresponding to files in the volume) into the 

file and/or single instance file components are intended to 40 work item queue 102 via a depth first search of the file 

comprise any appropriate data structure that can hold at least system volume 62. The scan volume function 94 is time 

part of a file's contents. Notwithstanding, the link files 76, limited, whereby when called, it places as many files as 

78 may be maintained on the same file system volume 62, possible into the work item queue 102 within its allotted 

as is the common store file 74 and the common store time, and resumes adding files from where it left off when 

directory 80. This enables removable media to take the links 45 called again. Note that the scan volume function 94 may be 

and common store with it when removed, prevents format- given some filtering capabilities, e.g., such that it will not 

ting one volume (e.g., D:) from losing the common store file add common store files to the work item queue 102, however 

or links of another volume (e.g., C:), and so forth. at present the scan volume function 94 merely adds file 

Repeating the SIS_MERGE__FILES (and/or SIS_ identifiers as items to the work item queue 102 and any 

COPYFILE) processes for any other files that have the same 50 filtering is performed when work items are dequeued. The 

data will add links without substantially adding to the single scan volume function 94 is called only when needed, e.g., 

instance of the file. In this manner, for example, an admin- when the work item queue 102 is first created, or if a 

istrator user of a file server may place the links for many problem occurs with the USN log 104 or the database 100, 

client users on each user's private directory, while main- since the extract log function 92 may not be provided with 

taining only one instance of the file on the server. Note that 55 the proper file change information. 

it is feasible to have a SIS common store file with only one The grovel function 96 removes items from the work item 

link thereto, while alternatively, a control may be imple- queue 102, and processes each removed item to determine if 

mented that allows more than two files to be specified at the it meets some criteria, e.g., whether the file corresponding to 

same time for merging into a single instance representation that work item has a duplicate file in the volume 62. To this 

thereof. As also described below, it also may occur that the 60 end, the grovel function 96 computes a checksum 

groveler 60 detects a file (that is not a SIS link file) but (signature) from the file's data, and queries the file infor- 

already has a single instance representation of its data in the mation database 100 via a database manager 106 (including 

common store directory 80. In such an instance, the non-SIS a query engine) to see if one or more files in the volume have 

link file may be converted (as described below) to a link to the same checksum and file size. At the same time, the 

the existing single instance file. 65 database manager 106 updates the file information database 

In accordance with one aspect of the present invention 100 as needed with the file information, e.g., adds new 

and as generally represented in FIG. 3, to accomplish records or changes existing records by storing or modifying 
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the file size and signature indexed by the file ID in the regulated via the aforementioned back-off technology, and 
database 100. If at least one matching file is found, the thus ordinarily varies with respect to the activity of fore- 
grovel function 96 performs a byte-by-byte comparison with ground processes so as to limit interference of the groveling 
the matching file set to determine if one of the files is an operations therewith. Note that back-off technology also 
exact duplicate, and if exact, calls the SIS facility 66 via the 5 may vary the frequency of groveling based on other factors, 
SIS_MERGE_F1L£S control 64 to merge the files. In this such as to grovel more frequently when disk space is deemed 
manner, duplicate files are automatically identified and low in an increased-priority effort to gain free space by 
combined in a rapid manner. merging files. 

Turning to an explanation of the present invention with The extract log function 92 is called at a frequency that 

particular reference to the flow diagrams of FIGS. 5-9, the 10 varies in an attempt to place a constant number of items in 

groveler 60 begins at step 500 when the central controller 82 the work item queue per call, and thus the frequency of 

determines which of the volumes are allowed to be groveled. calling the extract log function 92 is based on the amount of 

A preferred way to determine when and how to run a file system volume activity taking place. For example, if 

background process (such as the way in which the groveler disk activity is at a high rate, a large number of USN records 

60 is typically run) uses a technology sometimes referred to 15 will be extracted from the USN log 104, whereby the extract 

as "back off technology," that measures the actual perfor- log function 92 will likely add a larger number of items to 

mance of background tasks, including the performance of the work item queue 102 relative to times of slow disk 

input/output (VO) operations. The measurements are used to activity. By using the number of records extracted during the 

statistically determine when the background process is likely most recent extract log function call to determine the time 

degrading the performance of a foreground process, in 20 duration before the next call, a high number of extracted 

which event the background process temporarily suspends records will cause a higher rate of calling extract log, while 

its execution, i.e., backs off. As a result, certain volumes may a lower number will cause a lower rate of calling. Over a 

be too busy to grovel, for example, or locked by a disk period of time, the changes to the rate of calling roughly 

utility, whereby only a subset of the total number of volumes provide the desired number of items being placed in the 

may be groveled at a given time. Such back off technology 25 work item queue 102 per call. Note that the rates may be 

is described in copending United States Patent Application adjusted gradually to smooth out any abrupt changes in disk 

entitled "Method and System for Regulating Background activity. This is done to trade off the expense of draining the 

Tasks Using Performance Measurements,** filed concur- USN log too frequently against the possibility that the log 

rently herewith, assigned to the assignee of the present will overflow and force a potentially very expensive scan 

invention and hereby incorporated by reference herein in its 30 volume. 

entirety. Notwithstanding, virtually any mechanism includ- If it is time for the extract log function for any volume 

ing CPU scheduling priority may be used to determine when (e.g., the volume 62), the USN log for that volume 62 is 

to execute the groveler 60 on a volume, and moreover, the checked at step 602 to determine whether it is correct. Note 

groveler 60 may be periodically run as a foreground process. that the USN log is typically of a fixed size, and so only 

Once the set of volumes that can be groveled is deter- 35 keeps a fixed number of entries, corresponding to the most 

mined at step 500, the open function of the groveler worker recent updates to the volume, i.e., older entries are dis- 

(e.g. object 86) is called at step 502 for one of the volumes, carded. If there are more volume updates than space in the 

and as represented by step 508, is repeated for each volume. USN log between times when the groveler worker 86 looks 

Note that although not shown, the groveler worker (object) at the log, it will miss some volume updates, and the USN 

86 is instantiated (if not already instantiated) before calling 40 log will be deemed to be incorrect. The USN log will also 

its functions. For each volume, if at step 504 the result of the be deemed to be incorrect if it is corrupted, for instance by 

open call indicates a volume scan is needed, a scan is a data error on the underlying disk, 

initiated at step 506 by setting a flag associated with the If at step 602 the USN log 104 is not correct, step 602 

volume. After repeating this process via step 508 for each branches to step 608 to initiate a scan volume operation for 

volume of the set that can be groveled, a main loop (FIG. 6) 45 that volume as described above. Step 610 resets the time for 

is entered, as described below. performing the next extract log function, as also described 

FIG. 7 shows the general operation of the open function above. 

88 for a given volume, beginning at step 700 where the If it is time to call the extract log function (step 600) and 

groveler worker 86 attempts to open the file information the USN log 104 is correct (step 602), the extract log 

database 100. If not successful as detected by step 702, the 50 function 92 is called at step 604 (as described via FIG. 8) to 

groveler worker 86 creates a new database 100 at step 704 place items in the work item queue 102 corresponding to the 

for the volume along with a new work item queue 102 at step USN entries since the last recorded USN record that was 

706. The groveler worker 86 also stores the volume's current processed. 

USN value at step 708, and sets a return code at step 710 to As represented in FIG. 8, the extract log function 92 

indicate that the partition controller 84 needs to call the scan 55 begins by extracting a list of file identifiers corresponding to 

volume function 94 to begin filling the work item queue 102 modified files from the USN log 104. Note that the appli- 

with the volume's file identifiers as described above. The cation programming interface (API) or the like that allows 

current USN number is stored so that the extract log function the USN log 104 to be read may filter its return such that 

92 will be able to handle any file changes that happen after only selected types of files are returned, i.e., only those that 

the volume scan is started. 60 are new or changed, and such that the same file is not listed 

FIG. 6 shows the main loop for operating the groveler multiple times. In addition, at step 802 the extract log 

functions. In each pass of the loop, for each volume it is function 92 may exclude certain files from the items to 

determined at step 600 whether it is time for an extract log queue, such as common store files. For example, if a 

function call, time to grovel, or neither. If it is time for common store file is created from two identical files found 

neither function, the process waits until the appropriate time 65 by the groveler 60, appropriate entries will go into the USN 

is achieved, as represented by the dashed line "looping" log 104 to reflect this file system activity. These entries are 

back to step 600. Note that the time to grovel a volume is flagged by the groveler 60 so that they are recognized and 



US 6,389 : 

9 

not again processed by the groveler 60. Alternatively, these 
items may be filtered out by the USN retrieval process 
(API). Also, it is possible that the user or system may choose 
to exclude certain files (e.g., on a per file or per directory 
basis) from automatic merging, essentially overriding the 5 
groveler 60 for selected files. If any such files are identified, 
these files are excluded at step 804. 

At step 806, the remaining files are added as items to the 
work queue, identified by their volume-unique file identifier. 
At step 808, the extract log function 92 records the last USN 10 
handled so that the extract log function 92 will begin at the 
correct location the next time it is called. As described 
above, the extract log function 92 returns a count of the USN 
entries extracted, from which the partition controller 84 
calculates (at step 606 of FIG. 6) the next time to call the 15 
extract log function 92. Steps 806 and 808 are handled as an 
atomic database transaction so that a system crash in the 
middle will not result in the pointer being updated without 
the items being added to the work queue. 

Returning to FIG. 6, following the extract log call, the 20 
time for the next extract log is calculated at step 606, and the 
process loops back to step 600. After repeating the process 
until it is not the extract time for any volume, step 600 
branches to step 612 where the central controller 82 and/or 
partition controllers (e.g., 84 c >-84 £ ) may determine which of 25 
the volumes is the most important to grovel relative to 
others. One criterion for determining relative importance 
includes how much free space is left on a volume. Additional 
criteria may include when the volume was last groveled 
and/or the results of that grovel operation, so that, for 30 
example, the same nearly-full volume is not always consid- 
ered the most important, particularly if there are no files to 
merge thereon. 

Once the most important volume is selected, if at step 614 
a volume scan is in progress (has been initiated and not 35 
completed) and the work queue is empty, the scan volume 
function 94 of the groveler worker 86 is called to begin/ 
continue a scan operation of the entire volume 62, i.e., to 
begin or continue filling the work item queue 102 with the 
volume's file identifiers as described above. The current 40 
USN number is stored so that the extract log function 92 will 
be able to handle any file changes that happen after the 
volume scan is started. As mentioned above, the scan 
volume operation is time limited, presently 200 
milliseconds, although of course this time may be made 45 
variable. For example, the number of items actually queued 
per call may be returned by the scan volume function and 
used to determine the time duration for the next call. 

If at step 614 a volume scan is not in progress, or if the 
work queue is not empty, the grovel function (FIG. 9) is 50 
called at step 618, after which the process returns to the main 
loop at step 600. 

In the grovel function, as represented beginning at step 
900, the first item is dequeued from the work item queue 
102. More particularly, the work item queue 102 is coove- 55 
niently maintained as a database and accessed by the data- 
base manager 106 using transactions, whereby each item is 
atomically handled or not handled. As a result, the work item 
is not considered removed until fully processed, whereby if 
a system failure (crash) occurs before complete processing, 60 
the work item is not lost. However, for purposes of simplic- 
ity herein, the work item may be considered dequeued at this 
time, and step 902 then tests whether the dequeued item 
corresponds to a file that is a candidate for merging. For 
example, some files are considered too small to be worth- 65 
while merging, others may already be links or common store 
files, while others may include rep arse points (described 
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below) that make the file ineligible for SIS merging. If the 
file is not a SIS candidate, step 902 branches ahead to step 
916 to determine if the time is expired for this call or if there 
are no more items to dequeue, whereby the grovel function 
96 will end. If instead the file is a SIS candidate, step 902 
branches to step 904 where the file has a checksum 
(signature) computed therefor comprising a hash function 
applied to selected blocks of data (e.g., a one four kilobyte 
block from one-third of the way into the file, and a second 
four kilobyte block from two-thirds of the way into the file). 
As described below, this signature is computed in the same 
way SIS computes a signature for link files, although the 
present invention docs not depend on this in any way, and 
other signature algorithms may be employed. At step 906, 
the file information database 100 is queried via the database 
manager 106 to obtain a list of files that have the same file 
size and signature as the file corresponding to the currently 
dequeued item. 

If any matching files are returned as determined by step 
908, step 910 takes an item from the match list and fully 
compares each byte in the matching file with each byte in the 
file corresponding to the work queue item to determine if the 
fries are exact duplicates. Note that although not shown, 
some optimizations may be employed. For example, if the 
database manager 106 retrieves any SIS link files as match- 
ing the signature and file size, those may be tested first, 
actually using the common store file data for the full 
comparison. This is generally preferable to using the link file 
or a normal file because the link or normal file might be in 
use, in which event the groveler would have to postpone the 
comparison. The common store file is not busy in this way, 
and thus docs not manifest this behavior. Moreover, to avoid 
a performance impact on other processes via pollution of the 
disk buffer cache, file reads are non-cached. To avoid 
interfering with foreground processes via file locking 
conflicts, opportunistic locks are used by the groveler 60 
when accessing a file, which temporarily suspend access to 
the file by another process until the groveler 60 can release 
it. 

If at step 912 a potentially matching file does not match, 
the process continues until the match list is empty as 
determined by step 908 or until an exact match is found at 
step 912. The first exact match that is found ends the 
comparison, as a merge may then take place at step 914, as 
described below. As shown in FIG. 9, the grovel function 96 
continues to dequeue items from the work item queue 102 
until the work item queue 102 is empty or the time expires. 

Following the calling of the grovel function 96 at step 
618, the grovel function 96 may be called as many times as 
needed until the partition controller 84 is halted by the 
central controller 82, e.g., by a call thereto or by expiration 
of a time setting. 

As can be readily appreciated, the partition controller 84 
interleaves calls to the scan volume function 94 , extract log 
function 92 and grovel function 96 while the scan volume 
operation is not complete. Once the scan volume operation 
is complete, the extract log function 92 and grovel function 
96 are interleaved to respectively add items to the work 
queue 102 as files are created and/or modified, and remove 
those items from the queue 102 in search of duplicates. The 
interleaving of the scan volume function 94 that adds items 
to the work item queue 102 with the function that removes 
items from the work item queue (the grovel function 96) 
avoids the need to allocate a very large queue 102 for the 
many files possible in a file system volume 62. The extract 
log function runs to completion once it is started in order to 
prevent the USN log from overflowing and thus forcing an 
expensive scan volume process. 
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SINGLE INSTANCE STORE sponding to the work item is a SIS link. If not, step 1102 

The present invention has been described with reference branches to step 1106 where the contents of the matching file 

to identifying duplicate files for merging into a single are copied as file data 118 to a newly allocated file (e.g., 74) 

instance store with links thereto, and a system for perform- m the common store 80 (FIG. 2A). Note that for efficiency, 

ing the merging of duplicate files is described herein. 5 SIS (and/or the groveler 60) may employ some threshold 

However, as can be readily appreciated, the present inven- size test ma kiiig the copy. Further, note that SIS_ 

tion may have numerous applications in identifying files MERGE„FILES control does an actual copy of the contents 

with similar properties, not only files that are exact dupli- of me matching file t0 lbe store ^ raflicr than a 

cates of one another . For example, me present invention may renaffie of ^ matching fii e . The link file representing the 

be used to find files that although not having exactly iQ ^ fik ^ ^ ^ * 

duplicated data, are very similar to one another, e.g., have . ° . . „ . .... Krrro , ^ 4 

differences below some threshold number of differences. A ***** as8 * pcd ^ ' he ^ * SLS^S 

mechanism that commonly stored the similar data as a file, ^e, 80 ^ ^ open requests directed to the NTFS filelD 

with separate links thereto along with the file deltas, may be are to the link file rather than to the common store file. This 

alternatively utilized to combine file data. me 10 number * used by SIS to identify the file, whereby 

For efficiency, the SIS facility 66 may be built into the file 15 an Y user-renaming of the link file by the user is not an issue, 

system. However, although not necessary to the present In an alternate embodiment, SIS could use rename in order 

invention, primarily for flexibility and to reduce complexity, to avoid copying the file data, possibly at the cost of having 

it is preferable in the Windows 2000 environment to imple- the source file's file ID change because of the copy 

ment the SIS facility 66 as a filter driver 66' (FIG. 12). operation, or by having support for a rename operation that 

Indeed, the present invention was implemented without 20 leaves the file IDs unchanged in the underlying NTFS 130. 

changing the Windows NT® file system (NITS). The common store file 74 in the common store 80 is 

Notwithstanding, it will be understood that the present named based upon a 128-bit universal unique identifier 

invention as described above is not limited to the NTFS filter (UUID), shown in FIGS. 2A-2B as the file CbmmonStoreX 

driver model. (UUIDj). Using a UUID is particularly beneficial when 

In the NITS environment, filter drivers are independent, 25 backing up and restoring SIS files, since files with the same 

loadable drivers through which file system I/O (input/ UUIDs are known to be exact copies, and more than one 

output) request packets (IRPs) are passed. Each IRP cone- such copy is not needed in the common store 80. 

sponds to a request to perform a specific file system While not shown in FIG. HA, if a copying error occurs, 

operation, such as read, write, open, close or delete, along the matching file remains unchanged, an appropriate error 

with information related to that request, e.g., identifying the 30 message is returned to the requesting user, and the SIS__ 

file data to read. A filter driver may perform actions to an IRP MER G E_F1 LES control 64 is terminated. In the normal 

as it passes therethrough, including modifying the IRFs event where there are no errors in the copying process, step 

data, aborting its completion and/or changing its returned 1106 continues to step 1108 where the matching file is 

completion status. converted to a SIS link file (e.g., the link file 76, FIG. 2B). 

The SIS link files 76-78 do not include the original file 35 To convert a file to a SIS link file at step 1108, the 
data, thereby reclaiming disk space. More particularly, the SIS„MERGE_F1LES control 64 provides the reparse point 
link files are NTFS sparse files, which are files that generally 110, including the SIS tag 114, and reparse data 116 inchid- 
appear to be normal files but do not have the entire amount ing the common store file's unique file identifier 120 and a 
of physical disk space allocated therefor, and may be signature 122 (FIG. 10). The signature 122 is a 64-bit 
extended without reserving disk space to handle the exten- 40 checksum computed by applying a trinomial hash function 
sion. Reads to unallocated regions of sparse files return (known as the 131-hasb) to the file data. The common store 
zeros, while writes cause physical space to be allocated. file 74 maintains the signature therewith as part of a back- 
Regions may be deallocated using an I/O control call, pointer stream 124, described below. The only way to 
subject to granularity restrictions. Another I/O control call determine the signature is via the file data contents, and thus 
returns a description of the allocated and unallocated regions 45 the signature may be used to provide security by preventing 
of the file. unauthorized access to the contents via non-SIS created 

The link files 76, 78 include a relatively small amount of reparse points as described below, 

data in respective reparse points 110, 112, each reparse point As another part of the conversion to a link file 76 at step 

being a generalization of a symbolic link added to a file via 1108, the data of the file is cleared out using the aforcmcn- 

an I/O control call. As generally shown in FIG. 10, a reparse 50 tioncd NTFS sparse file technology. The resulting link file 

point (e.g., U0) includes a tag 114 and reparse data 116. The 76 thus essentially comprises the reparse point 110 and a 

tag 114 is a thirty-two bit number identifying the type of shell for the data. At step 1110, the file 78 is created for the 

reparse point, i.e., SIS. The reparse data 116 is a variable- file corresponding to the work item in the same general 

length block of data defined by and specific to the facility manner, i.e., the link file 78 comprises a reparse point 112 

that uses the reparse point U0, i.e., SIS-specific data, as 55 having the same information therein and a shell for the data, 

described below. Each link file is on the order of approximately 300 bytes in 

FIGS. 11A-11B represent the general flow of operation size, 

when the groveler 60 makes a SIS__MERGE JiLES con- Step 1114 represents the adding of identifiers of any new 

trol request 64 to SIS to merge duplicates files via the SIS link files (converted via steps 1108 and/or 1110) to a 

driver 66'. The SIS driver 66' receives such requests, and at 60 backpointer stream 124 maintained in the common store file 

step 1100 determines whether the matching file is already a 74. As described in more detail below, the backpointers 

SIS link file. If the matching file is a SIS link file, step 1100 identify to the common store file 74 the link files that point 

branches to step 1104 to handle the merge depending on to it. As also described below, backpointers are particularly 

whether the file corresponding to the work item is also a SIS useful in delete operations, i.e., delete the backpointer when 

link, as described below. 65 the link file is deleted, but only delete the common store file 

In the event that the matching file is not a SIS link, step when it has no more backpointers listed in the stream 124. 

1100 branches to step 1102 to determine if the file cone- At this time, the common store file 74 and the links 76, 78 
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thereto are ready for use as SIS files, and the files are closed tents of the reparse point attached, by sending the IRP back 

as appropriate (step 1116). up the driver stack, as represented in FIG. 12 by the arrow 

Alternatively, if at step 1102 the file corresponding to the with circled numeral two. As represented in FIG. 13A, at 

work item is already a SIS link file, there is no need to create step 1300 the SIS filter 66' receives the STATUS_ 

another common store file to merge the files. Step 1102 thus 5 REPARSE error and recognizes the IRP as having a SIS 

branches to step 1112 where the matching file is converted reparse point. 

to a link file as described above with reference to step 1108. In response, via steps 1302-1304, the SIS filter 66* opens 

Step 1112 then continues to step 1114 to add the backpointer the common store file 74 identified in the reparse point if the 

of the link file converted from the matching file to the common store file 74 is not already open, and reads the 

common store file, and then the files are closed as appro- 10 signature therein. This is accomplished by the SIS filter 66* 

priate at step 1116. sending separate IRPs to NTFS 130 identifying the common 

Returning to step 1100, in the event that the matching file store file by its UUID name 120 (FIG. 10) in the reparse 

is a SIS link file, step 1100 branches ahead to step 1104 to point 110, and then requesting a read of the appropriate data, 

determine if the file corresponding to the work item is also Then, at step 1306, if the open proceeded correctly, the SIS 

a SIS link file. If at step 1104 the file corresponding to the 15 filter 66* compares the signature 122 in the reparse point with 

work item is not a SIS link, step 1104 branches to perform the signature in the backpointer stream 124 of the common 

steps 1110-1116 to convert the file corresponding to the store file 74. If they match, step 1306 branches to step 1320 

work item to a SIS link file in the manner described above. of FIG. 13B as described below. However, if the signatures 

If instead the file already is a SIS link file, i.e., both files are do not match, the SIS filter 66* allows the open to proceed 

SIS link files, step 1104 branches to step 1120 of FIG. 11B. 20 by returning a file handle to the link file to the user, but 

At step 1120 of FIG. 11B, the link files are evaluated to without attaching SIS context to the opened file, essentially 

determine if they refer to the same common store file. If so, denying access to the common store file 74 for security 

there is nothing to merge, and thus the process ends by reasons. 

returning to step 1116 (FIG. 11A) and closing any files as More particularly, a SIS reparse point may be generated 

appropriate. If the two files do not refer to the same common 25 external to SIS, including the UUID-based name of a 

store file, then the work item's corresponding file is con- common store file, a name which can be guessed in a 

verted to point to the common store file to which the relatively straightforward manner. As a result, without the 

matching store file points, and the corresponding common signature check, such an externally-generated reparse point 

store files are appropriately modified. Note that this situation could give potentially unauthorized access to the common 

is possible if the user creates links by using the SIS_ 30 store file. However, since the SIS-reparse point has a 

COPYF1LE method. signature, and the signature may only be computed by 

More particularly, to properly handle the conversion of having access to the file data, only those who already have 

the work item's corresponding file and the fixup of the access to the file data can know the signature and provide a 

common store files, as represented by step 1122, the back- valid SIS-reparse point. The file data in the common store is 

pointer for the file corresponding to the work item is 35 thus as secure as the file data was in the original source file, 

removed from its corresponding common store file. To If the signature does not match at step 1306, step 1308 

adjust the file corresponding to the work item, the reparse returns access to the link file without corresponding access 

point of this link file is converted to point to the common to the common store file to the user. Step 1310 then tests to 

store file referred to by the matching file, as represented by see if another link file has the common store file open, and 

step 1124. Also, as represented by step 1126, a backpointer 40 if not, step 1312 closes the common store file 74. More 

to the file that corresponds to the work item is added to the particularly, SIS maintains a data object that represents the 

common store file that is referred to by the matching link common store file, and the common store file data object 

file. The process then returns to FIG. HA to close the files keeps a reference count of open link files having a reference 

as appropriate at step 1116. thereto. Step 1310 essentially decrements the reference 

Turning to FIGS. 12 and 13, there is provided an expla- 45 count and checks to see if it is zero to determine whether it 

nation of how a request to open a link file is handled by the needs to close the common store file handle. Note that valid 

SIS/NTFS architecture. As shown in FIG. 12, an open users are thus not stopped from working with their valid 

request in the form of an IRP, (including a file name of a file links to the common store file 74 if an invalid reparse point 

that has a SIS reparse point), as represented by the arrow is encountered during the valid users' sessions, 

with circled numeral one, comes in as a file I/O operation 50 If the signatures match at step 1306, at step 1320 the SIS 

and is passed through a driver stack. The driver stack filter driver 66* sets a FILE_OPEN_REPARSE_J*OINT 

includes the SIS filter driver 66' with other optional filter flag in the original link file open IRP, and returns the IRP to 

drivers 126, 128 possibly above and/or below the SIS filter the NTFS 130, as shown in FIG. 12 by the arrow with circled 

driver 66*. For purposes of the examples herein, these other numeral three. This flag essentially instructs the NTFS 130 

filter drivers 126, 128 (shown herein for completeness) do 55 to open the link file 76 despite the reparse point. As shown 

not modify the IRPs with respect to SIS-related IRPs. At this in FIG. 12 by the arrow with circled numeral four, the NTFS 

time, the SIS filter driver 66* passes the IRP on without 130 returns success to the SIS filter 66' along with a file 

taking any action with respect thereto, as it is generally not object having a handle thereto (assuming the open was 

possible to determine if a given filename corresponds to a successful). At step 1322 of FIG. 13B, when the success is 

file with a reparse point until NTFS processes the open 60 received, the SIS filter driver 66 T attaches context 132 (FIG. 

request. 2B) to the file object, including a context map 134 (FIG. 10) 

When the SIS link open IRP reaches the NTFS 130, the that will be used to indicate any portions of the link file that 

NTFS 130 recognizes that the file named in the IRP has a have been allocated to data. Note that the context 132 is an 

reparse point associated therewith. Without further in memory structure and only attached while the file is open, 

instruction, the NTFS 130 does not open files with reparse 65 and is thus represented by a dashed box in FIG. 2B to reflect 

points. Instead, the NTFS 130 returns the IRP with a its transient nature. If the link file has any allocated data 

STATUS _JIEPARSE completion error and with the con- portions, those portions are marked in the map 134 in the 
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context as "dirty" at step 1322. A link Gle having allocated 
data when first opened is a special case situation that occurs, 
for example, when the disk volume 62 was full, as described 
below. 

At step 1326, a check is made to ensure that the link file's 5 
identifier is listed among the backpointers in the backpointer 
stream 124 of the common store file 74. It is possible for the 
list of backpointers in the stream 124 to become corrupted 
(e.g., when the SIS filter driver 66* is not installed) whereby 
the link file 76 is not listed. If not listed at step 1326, the link 10 
file's identifier, which is known to identify a valid link, is 
added to the list of backpointers 124 at step 1328, and a 
volume check procedure 136 (FIG. 2B) is started at step 
1330 (unless already running). The volume check 136 
essentially works with the backpointer streams of the vari- 
ous common store files (UUID/-UU1DJ so that common 
store files do not contain backpointers to link files that do not 
exist, so that common store files do not remain and use disk 
space without at least one link pointing thereto, and so that 



data is current in the common store file 74 and which part is 
current in the link file 76. By way of example, consider a 
user requesting to write ten kilobytes of data beginning at 
offset one megabyte, as generally shown in FIG. 10. The 
NTFS 130 allocates the space, unless already allocated, in 
the appropriate region 138 of the link file's (sparse) data 
space 140 (note that the NTFS actually allocates space in 
64-kiIobyte blocks). SIS then marks the context map 134 to 
reflect this dirty region, as shown in FIG. 10. Note that since 
the changes are not written to the common store file 74, the 
changes written to one link file are not seen by any other link 
to the common store file 74. 

SIS thus lets NTFS 130 handle the allocation of the space 
in the sparse file and the writing thereto. However, if SIS is 
implemented in a file system that did not have sparse file 
15 capabilities, SIS could perform the equivalent operation by 
intercepting the write request and writing the data to a 
temporary file. Upon closing the "changed" link file, SIS 



only need copy the clean data from the common store file to 
the temporary file, delete the link file and rename the 
each valid link file has a backpointer in the corresponding 20 temporary file with the name of the link file to achieve the 
common store file. At step 1332, if volume check 136 is logical separation of files in a transparent manner, 
running, a check bit, used by the volume check 136, is set FIGS. 16 and 17 describe how the SIS filter 66* handles 
to one in the backpointer for the file each time that link file a read request to the open link file 76. As shown in FIG. 16, 
is opened. The volume check 136 and check bit are the SIS read request comes through the driver stack to the 
described in the aforementioned copending United States 25 SIS filter driver 66* as an IRP, including the file handle and 



attached context. The SIS filter driver 66' recognizes the 
attached context 132 as belonging to SIS, and intercepts the 
IRP, shown in FIG. 16 by the arrow with circled numeral 
one. 

As shown in step 1700 of FIG. 17, the SIS filter driver 
initially examines the map 134 in the attached context 132 
to determine if any of the link file is marked as dirty, i.e., 
allocated to file data. Step 1702 then compares the region 
that the IRP is specifying to read against the map 134, and 
if the read is to a clean region, step 1702 branches to step 
1704. At step 1704, SIS converts the link file read request to 
a common store file read request IRP and passes the modi- 
lied IRP to the NTFS 130 as also shown by the arrow 
accompanied by the circled numeral 2a in FIG. 16. The 



Patent Application entitled "Single Instance Store for File 
Systems." 

At step 1334, the handle to the link file is returned to the 
user, shown in FIG. 12 by the arrow with circled numeral 
five. Note that the user thus works with the link file 76, and 30 
generally has no idea that the link file 76 links the file to the 
common store file 74. At this time, assuming the signature 
was correct and the opens were successful, the user has a 
handle to the link file 76 and the common store file 74 is 
open. 35 

Writing to a SIS link file 76 does not change the common 
store file 74, since other links to the common store file 74 are 
logically separate. Instead, write requests are written to 
space allocated therefor in the link file 76, as described 

below. In this manner, changing the data via one link does 40 NTFS 130 responds with the requested data (or an error) as 

not result in changes seen by the other links. Thus, by shown in FIG. 16 by the arrow with circled numeral 3a. The 

"logically separate" it is meant that in a SIS link, changes data (or error) is then returned to the user at step 1716 of 

made to one link file are not seen by users of another link FIG. 17, (circled numeral 4 in FIG. 16). Note that to the user, 

file, in contrast to simply having separate file names, the request appears to have been satisfied via a read to the 

protections, attributes and so on. If two users open the same 45 link file, when in actuality the SIS filter 66* intercepted the 

fink file, they will see one another's changes. request and converted it to a request to read from the 

FIGS. 14 and 15 describe how the SIS filter 66' handles common store file 74. 
a write request to the open link file 76. As shown in FIG. 14, Returning to step 1702, it is possible that via a write 

the SIS write request comes through the driver stack to the operation to the link file, some of the data requested to be 

SIS filter driver 66* as an IRP, including the file handle and 50 read is from a "dirty" region, that is, one that has been 

attached context 132. The IRP designates the region of the allocated and written to while the link file was open (or that 

file to be written and identifies the location of the data to was allocated on the disk when the link was first opened in 

write. The SIS filter driver 66' can recognize the context 132 step 1322). As described above, write requests cause space 

as belonging to SIS, but because the write is directed to the to be allocated in the link file 76 to provide an actual region 

link file 76, SIS lets the IRP pass to the NTFS 130 as shown 55 to maintain the current state of the changed data. At step 

in FIG. 14 by the arrow with circled numeral one and in FIG. 1702, if a requested region to read is marked as dirty, step 

15 as step 1500. NTFS attempts the write, allocating appro- 1702 branches to step 1706 to determine if the enure read is 

priate space in the link file 76, and SIS receives a status from from a dirty region or spans both dirty and clean regions, 
the NTFS at step 1502 (the arrow with circled numeral two If the entire region is dirty, then the SIS filter 66' passes 

in FIG. 14). If the write failed, e.g., the disk is full and the 60 the read request IRP to the NTFS 130 whereby the link file 

space could not be allocated, step 1504 branches to step 76 is read at step 1708 and returned to the SIS filter 66*. This 

1506 where the error is returned to inform the user. is represented in FIG. 16 by the arrows designated with 

If the write was successful, step 1504 branches to step circled numerals 2b and 3b. The data (or error) is then 

1508 where the SIS filter driver 66' marks the region that returned to the user at step 1716 of FIG. 17, (circled numeral 

was written as dirty in the context map 134 of the context 65 4 in FIG. 16). In this manner, the user receives the current 

132, while step 1510 then reports the successful write status changes that have been written to the link file rather than the 

to the user. In this manner, SIS tracks which part of the file stale data in the common store file 74. 
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Alternatively, if step 1706 detects that the user is request- 
ing both clean and dirty regions, the SIS filter 66' splits up 
the read request into appropriate requests to read the dirty 
region or regions from the link file 76 and the clean region 
or regions from the common store file 74. To this end, at 5 
steps 1710 and 1712, the SIS filter 66' uses the map 134 to 
generate one or more IRPs directed to reading the common 
store file 74 and passes at least one IRP directed to reading 
the link file 76 and at least one IRP directed to reading the 
common store file 74 to the NTFS 130. This is represented 10 
in FIG. 16 by arrows labeled with circled numerals 2a and 
2b. Assuming no read errors, step 1714 merges the read 
results returned from the NTFS 130 (in FIG. 16, the arrows 
labeled with circled numerals 3a and 3b) into a single result 
returned to the user at step 1716 (the arrow labeled with 15 
circled numeral 4). Note that any read error will result in an 
error returned to the user, although of course SIS may first 
retry on an error. By appropriately returning the current data 
in response to a read request from either the common store 
file 74 or the link file 76, or both, SIS maintains the logical 20 
separation of the link files in a manner that is transparent to 
the requesting user. 

FIG. 18 represents the steps taken when a request to close 
the handle to the link file 76 is received and the handle is 
closed at step 1800. At step 1802, a test is performed to see 25 
if this was the last handle currently open to this link file. If 
not, the process ends, whereby the link file is left open for 
operations via the other open file handles. If instead this was 
the last open handle, step 1804 makes a determination (via 
the context map 134) if any portion of the link file 76 is 30 
marked as dirty (allocated). If not, the driver 66* requests 
closing of the common store file handle, whereby steps 1806 
and 1808 cause the common store file 74 to be closed if no 
other links have the common store file 74 open, otherwise 
the common store file 74 remains open for the other links to 35 
use. Conversely, at step 1804, if any region of the link file 
76 was written to and is thus marked as dirty, step 1804 
branches to step 1810 since the link file may no longer be 
properly represented by the common store file 74. Note that 
steps 1810 and below may take place after the link file 40 
handle has been closed, by doing the work in a special 
system context. This allows the users to access the SIS file 
while the copyout of clean data is in progress. Step 1810 
copies the clean portions from the common store file 74 to 
space allocated therefor in the link file 76. If successful at 45 
step 1812, the now fully-allocated link file is converted back 
to a regular file at step 1814, essentially by removing the 
reparse point. In this manner, logically independent links to 
the common store file arc supported, as the changes made to 
one link file arc not seen via any other link file. The link file 50 
76 is then deleted from the list of files in the backpointer 
stream as described below with reference to FIG. 19, which 
may further result in the common store file being deleted. 
The process then continues to steps 1806 and 1808 to close 
the common store file if no other links have it open. Note 55 
that the handle to the common store file needs to be closed 
even if the common store file was deleted. 

However, it is possible that the clean data from the 
common store file 74 could not be copied back, particularly 
if the space therefor could not be allocated in the link file 76 60 
due to a disk full condition. If such an error occurs, step 1812 
branches to step 1816 which represents the canceling of the 
copyout and leaving the link file 76 as is, preserving the 
written data. Note that this will not cause a disk full 
condition because the space was already allocated to the link 65 
file during the earlier write request without an error, other- 
wise the write request that caused the space to be allocated 
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would have failed and the user notified (FIG. 15, steps 
1504-1506). As described above, when the link file is 
re-opened, step 1322 of FIG. 13B will mark the allocated 
portions of the link file 76 as dirty in the map 134, whereby 
the changes are properly returned when the file is read. Step 
1816 then continues to steps 1806 and 1808 to close the 
common store file if no other links have it open. 

In a similar manner to the disk full condition, it is thus 
possible in general to employ the SIS architecture to use the 
link file 76 to maintain changes (deltas), with the unchanged 
clean regions backed up by the common store file 74. To this 
end, instead of copying the clean portions from the common 
store file and reconverting the link file to a regular file when 
the file is closed, SIS may keep the link file as a link file with 
whatever space is allocated thereto. Some criteria also may 
be used to determine when it is better to convert the link file 
back to a regular file. For example, a threshold test as to the 
space saved may be employed to determine when to return 
a link file to a regular file versus keeping it as a link, whereby 
only link files with relatively small deltas would be main- 
tained as link files. As a result, SIS may provide space 
savings with files that are not exact duplicates, particularly 
if the file contents are almost exactly identical. As mentioned 
above, the groveler 60 may also identify near-duplicate files 
for merging in this manner. Notwithstanding, at present SIS 
preferably employs the copy-on-close technique of FIG. 18, 
since updates of SIS files and/or writes thereto are likely to 
be relatively rare. 

Turning to FIG. 19, there is shown a process employed by 
SIS after a link file is deleted (e.g., by file I/O) or recon- 
verted to a regular file (e.g., by the SIS close process). When 
a SIS link is deleted or reconverted to a regular file, the 
common store file 74 corresponding to that SIS link file is 
not necessarily deleted because other links may be pointing 
to that common store file 74. Thus, at step 1902, the 
backpointer stream 124 is evaluated to determine if the 
deleted backpointer was the last backpointer remaining in 
the stream, i.e., there are no more backpointers. If it is not 
the last backpointer, then there is at least one other link file 
pointing to the common store file 74, the common store file 
74 is thus still needed, and the process ends. In this manner, 
logically independent links to the common store file are 
again supported, as deleting one link file does not affect any 
other link file. 

If no backpointers remain at step 1902, this generally 
indicates that no link files are pointing to the common store 
file and thus the common store file is no longer needed. 
However, before deleting the common store file, step 1902 
branches to step 1904 where a test is performed as to 
whether the volume check procedure 136 is running. If so, 
there is a possibility that the backpointer stream is corrupted, 
as described below. If the volume check is not currently 
running, step 1904 advances to step 1908 to delete the 
common store file (after first closing it, if necessary). 
Otherwise, since the backpointer stream is not necessarily 
trustworthy, step 1904 branches to step 1906 where it is 
determined whether the volume check 136 is calling this 
delete procedure, i.e., whether the steps of FIG. 19 are being 
invoked from the volume check. If the volume check is not 
calling to delete the file, step 1906 ends the process without 
deleting the file, otherwise step 1906 branches to step 1908 
to delete the file. Step 1906 thus enables the volume check 
136 to delete a common store file when the volume check 
has concluded that the backpointer stream is correct and no 
link files point thereto. 

In sum, step 1908 deletes the common store file when the 
backpointer stream is both empty and trusted, thereby 
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reclaiming the disk space. Note that instead of backpointers, 
counts of the links may be alternatively used for this 
purpose, i.e., delete the common store file when a count of 
zero links thereto remain. Backpointers are preferable, 
however, primarily because they are more robust than 5 
counts. 

As can be seen from the foregoing detailed description, 
there is provided a method and system that provide for the 
identifying and merging of duplicate files. The method and 
system may operate dynamically as a real time background 
process, in an efficient manner. 

While the invention is susceptible to various modifica- 
tions and alternative constructions, a certain illustrated 
embodiment thereof is shown in the drawings and has been 
described above in detail. It should be understood, however, 
that there is no intention to limit the invention to the specific 15 
form or forms disclosed, but on the contrary, the intention is 
to cover all modifications, alternative constructions, and 
equivalents falling within the spirit and scope of the inven- 
tion. 

What is claimed is: 20 

1. A computer-readable medium having computer- 
executable instructions, comprising, automatically identify- 
ing at least two files having duplicate data, automatically 
merging the duplicate data of the files into a single instance 
representation of that data, converting each of the files into 25 
logically separate links to the single instance representation, 
each link comprising a logically separate link file that 
provides logically separate file system access to the single 
instance representation of the file data, and reclaiming 
storage space that was occupied by the duplicate data of at 30 
least one of the files. 

2. The computer-readable medium having computer- 
executable instructions of claim 1 wherein automatically 
identifying at least two files having duplicate data includes, 
adding file identifiers to a work item queue. 35 

3. The computer-readable medium having computer- 
executable instructions of claim 2 wherein adding file iden- 
tifiers to a work item queue includes scanning a volume for 
file identifiers. 

4. The computer-readable medium having computer- 40 
executable instructions of claim 3 wherein scanning the 
volume for file identifiers occurs for a limited time. 

5. The computer-readable medium having computer- 
executable instructions of claim 2 wherein adding file iden- 
tifiers to a work item queue includes extracting file in for- 45 
mation from a log of file activity. 

6. The computer-readable medium of claim 5 having 
further computer-executable instructions for calculating a 
time for extracting file information from the log. 

7. The computer-readable medium having computer- 50 
executable instructions of claim 6 wherein the time calcu- 
lated is based on an amount of file information previously 
extracted from the log. 

8. The computer-readable medium having computer- 
executable instructions of claim 1 wherein automatically 55 
identifying at least two files having duplicate data includes, 
dequeuing a file identifier from a work item queue. 

9. The computer-readable medium of claim 8 having 
further computer-executable instructions for, querying a 
database of file information for a set of at least one file 60 
having properties that match properties of a file correspond- 
ing to the identifier dequeued from the work item queue. 

10. The computer-readable medium having computer- 
executable instructions of claim 9 wherein querying the 
database of file information includes providing a file size of 65 
the file corresponding to the identifier dequeued from the 
work item queue to a database manager. 
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11. The computer-readable medium of claim 9 having 
further computer-executable instructions for, calculating a 
signature of the file corresponding to the identifier dequeued 
from the work item queue. 

12. The computer-readable medium having computer- 
executable instructions of claim 11 wherein querying the 
database of file information includes providing the signature 
to a database manager. 

13. The computer-readable medium having computer- 
executable instructions of claim 12 wherein querying the 
database of file information further includes providing a file 
size of the file corresponding to the identifier dequeued from 
the work item queue to the database manager. 

14. The computer-readable medium of claim 9 having 
further computer-executable instructions for, receiving the 
set of at least one file having properties that match properties 
of the file corresponding to the identifier dequeued from the 
work item queue, and comparing the data of at least one file 
in the set with the data of the file corresponding to the 
identifier dequeued from the work item queue. 

15. The computer-readable medium having computer- 
executable instructions of claim 14 wherein comparing the 
data determines if each file is an exact duplicate of the other. 

16. A method of identifying files having similar properties 
on a file system volume, comprising, in a first operation, 
adding file information to a queue, in a second operation 
distinct from the first operation, removing file information 
from the queue, querying a database with at least one 
property of a file corresponding to the file information 
removed from the queue, and receiving a set of at least one 
file identifier, each file identifier in the set corresponding to 
a file having at least one similar property of the file corre- 
sponding to the file information removed from the queue. 

17. The method of claim 16 wherein adding file informa- 
tion to a work item queue includes scanning a volume for file 
identifier information. 

18. The method of claim 17 further comprising limiting 
the time for scanning the volume. 

19. The method of claim 17 wherein adding file informa- 
tion to a work item queue includes extracting file informa- 
tion from a log of file activity. 

20. The method of claim 19 further comprising calculat- 
ing a time for extracting file information from the log. 

21. The method of claim 20 further comprising, returning 
an amount of file information extracted from the log, and 
using the amount to calculate a next time for extracting file 
information from the log. 

22. The method of claim 16 further comprising calculat- 
ing a signature of the file corresponding to the file informa- 
tion removed from the queue, and wherein querying the 
database includes providing the signature to a database 
manager. 

23. The method of claim 22 wherein querying the data- 
base further includes providing a file size corresponding to 
the file information removed from the queue to the database 
manager. 

24. The method of claim 16 wherein querying the data- 
base includes providing a file size corresponding to the file 
information removed from the queue. 

25. The method of claim 16 further comprising, compar- 
ing data in the file that corresponds to the file information 
removed from the queue to the data in at least one file 
corresponding to file identifier information in the set, and if 
sufficiently similar, merging the files into a single instance 
representation thereof having independent links thereto. 

26. The method of claim 25 wherein comparing the data 
determines if each file is an exact duplicate of the other. 
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27. A system for identifying files having similar properties 
on a file system volume, comprising, a database including 
file property information, a database manager for querying 
the database, a work queue, a first component for adding file 
identifiers to the work queue, and a second component for 
removing file identifiers from the queue, the second com- 
ponent providing a query to the database manager, the query 
including property information corresponding to a file iden- 
tified by a file identifier removed from the queue, the second 
component receiving a set of file identifiers in response to 
the query, each identifier in the set corresponding to a file 
having property information that matches the file property 
information identified in the query. 

28. The system of claim 27 wherein the second compo- 
nent compares the data of the file corresponding to the file 
identifier removed from the queue with the data of at least 
one file corresponding to a file identifier returned in response 
to the query. 

29. The system of claim 28 wherein the second compo- 
nent performs a byte comparison of the data in each file to 
determine if the file data matches exactly. 

30. The system of claim 27 wherein if the comparison 
indicates the file data is similar, the second component calls 
a facility for merging the files, the facility providing a single 
instance representation of the file data and logically separate 
links thereto. 

31. The system of claim 27 further comprising a log for 
recording file activity, and wherein the first component 
extracts at least some of the file identifiers for adding to the 
work queue from the log. 
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32. The system of claim 27 further comprising a third 
component for scanning a volume to add file identifiers to 
the queue. 

33. The system of claim 27 wherein the file property 
5 information includes a file size. 

34. The system of claim 27 wherein the file property 
information includes a signature. 

35. The system of claim 34 wherein the second compo- 
nent computes a signature of the file corresponding to the file 

1Q removed from the queue. 

36. The system of claim 27 wherein the first and second 
components are functions within a single process, and 
wherein a partition controller corresponding to a file system 
volume calls the functions. 

37. The system of claim 36 including a plurality of 
15 partition controllers, each partition controller corresponding 

to a file system volume, and further comprising a central 
controller for controlling the operation of the partition 
controllers. 

38. The system of claim 37 wherein the central controller 
20 operates the partition controllers as a background process. 

39. The method of claim 16 wherein the first operation 
alternates with the second operation. 

40. The method of claim 39 wherein the first operation 
operates for a limited time, and the second operation oper- 

25 ates after the time to remove each set of file information 
added to the queue in the first operation. 

41. A computer-readable medium having computer- 
executable instruction for performing the method of claim 
16. 

♦ * + * * 
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ABSTRACT 



A method, system, and apparatus for ^eet&g^d^repjtiring^ 
damaged portions of a computer system is provided. In a 
preferred embodiment of the present invention, a damage 
detection and repair facility monitors and detects changes to 
the computer system. The damage detection and repair 
facility compares these changes to the set of constraints 
denned by the working definitions for each application 
installed on the computer system. The working definitions 
define the invariant portions of each application and define 
the constraints placed upon the computer system by each 
apphcation.4&sponsiy.e atoj£h apg^ 
this set'of^ro nstraints, thej damage d^^tiorr^and^repair 
facility malces suchchanges in the persistenUtorageso'as'to 
resolve these conflicts. This may be done, for example, by 
repairing a damaged file, installing a missing driverCor^ 
adiu^ng^anien^^nmerit^ariable.,. 
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1<requirements> 
<platform> 
<processor>xx86</processor> 

<os>"Windows 95" or "Windows 98" or "Windows NT" </os> 
I </plotform> 
f <services> 
I - n4 1 <service>TCP/IP</service> 
ou ^ <service>Rle System</service> 
</scrvices> 
<prereqs> 

506 \ <opplicotion>Acrobot</opplicotion> 
•l </prereqs> 

{<registryEntry nome="HKEY_CURRENT_USER/SoftwGre/X.Com"> 
<key>DebugPort<volue>TCP/IP Port#</volue></key> 
<key>Jova VM<value>x<test>x gt "1.1.6"</testX/valueX/key> 
</registryEntry> 
5 1 0 <directory nome="progromRoot"/> 

* <directory nome="jcloss M >progromRoot ,, /jcloss"</directory> 
<environmentVar name=poth> 

<odd>bin</odd> 
< /environmentVor> 
512 s <environmentVor nome=closspath> 

<odd>jcloss"/joppNetUl.jor"</odd> 
<test>progromRoot"/testClosspoth"</test> 
</environmentVor> 
<requirements/> 



FIG. 5 



U.S. Patent Oct. 28,2003 Sheet 5 of 6 US 6,640,317 81 



WORKING DEFINITIONS OF 
INSTALLED APPLICATIONS 



XML 




IDENTITY 






CODE 






REQUIREMENTS 






DATA 






ARTIFACT 






SETTINGS 







604 



602 
/ 



DAMAGE DETECTION 
AND REPAIR FACILITY 



RUNTIME REPRESENTATION 



100 
J.. 




nnnnnnnn 



PROCESSOR 

UULIUUUUU 



MEMORY 




108 

-106 / 

INPUT AND 
OUTPUT WITH THE 
REAL WORLD 



104 



FIG. 6 



U.S. Patent Oct 28, 20(B Sheet 6 of 6 



US 6,640,317 Bl 



702- 



706- 



C START ) 



MONITOR SYSTEM 




COMPARE NEW CHANGES AGAINST 
WORKING DEFINITIONS 




712 



REVERSE, LOG, OR 

REPORT THE 
CHANGE (OPTIONAL) 

i : 



ACCEPT CHANGES AND UPDATE SETTINGS 
IN EFFECTED WORKING DEFINITIONS 



FIG. 7 



US 6,640317 Bl 
1 2 

MECHANISM FOR AUTOMATED GENERIC or used by other applications, and such changes may render 

APPLICATION DAMAGE DETECTION AND one or more of the other applications inoperable. 

REPAIR IN STRONGLY ENCAPSULATED The use of applications may also require that the files and 

APPLICATION settings within a computer system be updated from time to 

5 time. This requires applications to be %vell behaved" with 

CROSS REFERENCE TO RELATED respect to each other. Conflicts may still occur, either by 

APPLICATIONS chance, or error within an application. Yet the application 

developer's priority is always to their application without 
The present application is related to co-pending U.S. regard to potential conflicts with other applications, except 
patent application Ser. No. 09/552,863 entitled "Strongly 1Q to insure their application wins such conflicts. Because the 
Encapsulated Environments for the Development, requirements and constraints of each application are not 
Deployment, and Management of Complex Application defined, each developer also becomes responsible for sup- 
configurations" filed even date herewith, now abandoned to porting the configuration management of every system on 
co-pending U.S. patent application Ser. No. 09/552,864 which the application is installed, and for hanging conflicts 
entitled "A Method for Creating Encapsulated Applications K m a11 configurations in which their application may be used, 
with Controlled Separation form an Application's Runtime 15 However this responsibility is rarely, if ever, each applica- 
Representation" filed even date herewith, and to co-pending lion o^veloper s top priority 

U.S. patent application Ser. No. 09/552,861 entitled "An A ™* svstem ° i d fP lov ^^drunnmg applications 

Application Development Server and a Mechanism for ™iLTL^^ ^S/SS^SS™ 

nJL * j - rvrir * Cr - . c- « ^ prevent a given application from damaging other applica- 

Providing Different Views into the Same Constructs within M dons j^f^m*, if handled m ^ gven application's 

a Strong^ Encapsulated Environment filed even date here- favor ^ may nt a ^^0^ application from running, 

with. The content of the above mentioned commonly ^ state of ^iis provides little motivation to avoid 

assigned, co-pending U.S. Patent applications are hereby conflicts with a competitor's application so long as the 

incorporated herein by reference for all purposes. developer's own application can be configured properly. 

25 Each application is dependent on its own installation and 

BACKGROUND OF THE INVENTION runtime code to verify its configuration and access to 

1 Technical Field prerequisite devices, drivers, applications, and system set- 
tings. 

The present invention relates generally to the field of Users themselves may cause problems by modifying files 

computer software and, more particularly, to methods of or configuration settings, either by accident, failure to follow 

detecting and repairing damaged files and settings within a 30 a procedure properly, etc. Often all the files and configura- 

data processing system. tion settings are exposed to modification by the user. 

2. Description of Related Art In reality, virus attacks account for a very small percent- 

Currently, computer systems are built up through a series of 4,1 W^tion failures. Yet a viral attack can result in 

of installations provided by different software developers, « a b J!? ge am . ou ° t ot 

each of which installs one or more different software com- 35 Qm * m * * e acce P ted me *°? of P«"*^B • 

, ~, ... . , , . , . computer system from damage from outside viral attacks is 

ponents. There is no mdustry standard way to describe what ^ ^ of ^ * vendor ^ protection 

comprises an application. Without this description, there is products . However, these products must be updated 

no way to implement a standard service that can protect the freqU ently to be able to detect the latest viruses. Any new 

integrity of each application. ^ virus that has been created since the software was updated 

There is a need for a standard, simple, scalable, platform may be undetected by the virus protection software, thus 

independent mechanism for detecting damaged applications enabling that vims to corrupt or destroy files necessary for 

and repairing them. Applications can be damaged in a the proper performance of the computer, 

variety of ways. Each of the following operations can Therefore, there is a need for a method, system and 

damage one or more applications within a computer system: 45 apparatus that automatically detects damaged files and appli- 

Installation of new applications calions and restore tbem to meir P rooer Virion. 

Reconfiguring applications SUMMARY OF THE INVENTION 

The use of an application The present application provides a method, system, and 

Application error apparatus for detecting and repairing damaged portions of a 

j 50 computer system. In a preferred embodiment of the present 

ser error invention, a damage detection and repair facility monitors 

^ ruses and detects changes to the computer system. The damage 

Installation is dangerous because the current approach in the detection and repair facility compares these changes to the 

industry is a rather ad hock approach in which the respon- M 0 f constraints defined by the working definitions for each 

sibility for the installation procedure for each application 55 application installed on the computer system. The working 

rests with each application's developer. During installation, definitions define the invariant portions of each application 

any error or conflict between any two or more applications and de fi ne fo c constraints placed upon the computer system 

may potentially corrupt the computer system configuration. by eac h application. Responsive to changes that are in 

Applications do not have access to the requirements and conflict with this set of constraints, the damage detection and 

dependencies of the other applications already installed on 60 repa j r facility makes such changes in the persistent storage 

the target computer system. Without this information, no so as to resolve these conflicts. This may be done, for 

modification to a computer system's files and settings can be example, by repairing a damaged file, installing a missing 

made completely safe. Yet the modification of files and driver> or adj^ing an environment variable, 
settings is required to install applications onto computer 

systems. 65 BRIEF DESCRIPTION OF THE DRAWINGS 

The reconfiguring of an application may also require that The novel features believed characteristic of the invention 

changes be made to files and configuration settings needed are set forth in the appended claims. The invention itself, 
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however, as well as a preferred mode of use, further objec- application will execute. If the persisted structure is wrong, 

lives and advantages thereof, will best be understood by the application will not execute. 

reference to the following detailed description of an illus- With reference now to FIG. 2, a block diagram of a data 

trative embodiment when read in conjunction with the processing system in which the present invention may be 

accompanying drawings, wherein: 5 implemented is illustrated. Data processing system 250 is an 

RG-ldepictsablockdiagramillustratingabasicpriorart ^ xam P 5 ^ of i I ^ m * m i er which ronforms t0 the architecture 

computer architecture structure; dc P^ A m Data Proccssmgsystttn 250 employs a 

„V. ~ . • t. i <• peripheral component interconnect (PCI) local bus arcbitec- 

FIG. 2 depicts a block diagram of a data processing mre ^ d ^ e k , a ^ ^ 

system in which the present invention may be implemented; ^ other bus ^^ures such as Micro Channel and ISA may 

FIG. 3 depicts a block diagram of a personal digital be used. Processor 252 and main memory 254 are connected 

assistant (PDA) in which the present invention may be to PCI local bus 256 through PCI Bridge 258. PCI Bridge 

implemented; 258 also may include an integrated memory controller and 

FIG. 4A depicts a block diagram illustrating a data cache memory for processor 252. Additional connections to 

structure for strongly encapsulating an application in accor- 15 PCI local bus 256 may be made through direct component 

dance with the present invention; interconnection or through add-in boards. In the depicted 

FIG. 4B depicts a block diagram of a new model for a example, local area network (LAN) adapter 260, SCSI host 
computer architecture in accordance with a preferred bus adapter 262, and expansion bus interface 264 are con- 
embodiment of the present invention; nected to PCI local bus 256 by direct component connection. 

FIG. 5 depicts a portion of XML code demonstrating one 20 In contrast, audio adapter 266, graphics adapter 268, and 

method the requirements part of the working definition may audio/video adapter (A/V) 269 are connected to PCI local 

be represented in accordance with a preferred embodiment bus 266 b * add " m 1Dto expansion slots, 

of the present invention; Expansion bus interface 264 provides a connection for a 

* j • . 1.1 i J- ... x . . . c keyboard and mouse adapter 270, modem 272, and addi- 

F1G. 6 depicts a block diagram illustrating a method of 4 . , 0 ~o, . . . , . • , 

. , j j * *• j -*u- *■ 25 tional memory 274. SCSI host bus adapter 262 provides a 

automated damage detection and repair within a computing a . c J . . . , . 7 , - , 

T r j , 4 r connection for hard disk dnve 276, tape drive 278, and 

system in accordance with a preferred embodiment of the - OA . 4 . , . . t , ' , . . . . 

3 , Y CD-ROM 280 in the depicted example. Typical PCI local 

present invention; and . . t # ^ ~f . * y FU — 

r . bus implementations will support three or four PCI expan- 

FIG. 7 depicts a flowchart illustrating an exemplary s ion slots or add-in connectors, 

method of implementing a damage detection and repair ^ • mns on proceS sor 252 and is used to 

facility in accordance with the present invention. coordinate and provide control of various components 

DETAILED DESCRIPTION OF THE data processing system 250 in FIG. 2. The operating 

PREFERRED EMBODIMENT system may be a commercially available operating system 

such as JavaOS For Business or OS/2, which are available 

With reference now to the figures and, in particular, with 35 from international Business Machines Corporation. Those 

reference to FIG. 1, a block diagram illustrating a basic prior of ordinary m me art will appreciate that the hardware 

art computer architecture structure is depicted. Before k pjQ 2 may vary depending on the implementation. For 

proceeding, it should be noted that throughout this example, other peripheral devices, such as optical disk 

description, identical reference numbers refer to similar or drives and ^ ^ may ^ uscd in addition to or in place of 

identical features in the different Figures. ^ ^ c hardware depicted in FIG. 2. The depicted example is not 

All existing computer architectures are patterned after a meant to imply architectural limitations with respect to the 
core, basic computer architecture 100. This core, basic present invention. For example, the processes of the present 
computer architecture is comprised of four elements: one or invention may be applied to a multiprocessor data process- 
more processors 106, one or more memory spaces 104, one ing system. 

or more mechanisms for managing input/output 108, and 45 Turning now to FIG. 3, a block diagram of a personal 
one or more mechanisms for defining persistence 102. The digital assistant (PDA) is illustrated in which the present 
Processors) 106 pcrform(s) the computations and instruc- invention may be implemented. A PDA is a data processing 
tions of a given application by loading its initial state into system (i.e., a computer) which is small and portable. As 
memory from the persisted image for that application. wjth foe computer depicted in FIG. 2, and as with all 
Persistence 102 is the persisted storage of the applications 50 computers, PDA 300 conforms to the computer architecture 
apart from memory 104. The basic nature of computer depicted in FIG. 1. The PDA is typically a palmtop 
systems requires that all applications be stored somewhere. computer, such as, for example, a Palm VII®, a product and 
Even if a person types a program into a computer system registered trademark of 3Com Corporation in Santa Clara, 
every time they wish to use that program, the program is Calif., which may be connected to a wireless communica- 
always stored somewhere, even if that storage is in some- 55 tions network and which may provide voice, fax, e-mail, 
one's head. How an application is stored does not vary much and/or other types of communication. The PDA 300 may 
from platform to platform. Applications are stored as, for perform other types of facilities to the user as well, such as, 
example, executable files and libraries, files, databases, for example, provide a calendar and day planner. The PDA 
registry entries, and environment variables. Typically these 300 may have one or more processors 302, such as a 
files, databases, etc. are stored on a physical nonvolatile 60 microprocessor, a main memory 304, a disk memory 306, 
storage device such as, for example, Read only Memory and an I/O 308 such as a mouse, keyboard, or pen-type input, 
chips, a hard disk, a tape, CD-ROM, or a DVD. Even in the and a screen or monitor. The PDA 300 may also have a 
most complex applications, the number of distinct persisted wireless transceiver 310 connected to an antenna 312 con- 
structures is really rather small, although there can be many figured to transmit and receive wireless communications, 
of each type. 65 The processor 302, memories 304, 306, I/O 308, and trans- 
Regardless of the application's complexity, it depends on ceiver are connected to a bus 304. The bus transfers data, 
its persisted image. If the persisted structure is correct, the i.e., instructions and information, between each of the 
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devices connected to it. The I/O 308 may permit faxes, portion of the application that the application developer may 

e-mail, or optical images to be displayed on a monitor or decide to use. The code section 406 may also include 

printed out by a printer. The I/O 308 may be connected to a executable images, files, source code, etc. to enable the 

microphone 316 and a speaker 318 so that voice or sound installation, maintenance, configuration, uninstall, etc. of the 

information may be sent and received. 5 application through the application's life cycle. 

The computers depicted in FIGS. 2 arid 3 are examples of The requirements section 408 defines what directory 

computers that conform to the computer architecture para- structures, environment variables, registry settings, and 

digm depicted in FIG. 1. The sophistication of the system, other persisted structures must exist and in what form for 

number of components, and speed of the processor may this application to be properly constructed and installed in 

vary, but the basic model is the same as the architecture 100 10 the persisted image of a computer system. The requirements 

depicted in FIG. 1: means for input and output such as a section 408 also details any services and applications that 

keyboard and display (whether LED or a video display are required to support this application. The requirements 

terminal), a processor, memory, and persistence. The per- section 408 details these concepts as a set of constraints, and 

sistence may be, for example, stored on a hard disk or hard also may define the order in which these constraints must be 

wired into the system. However, the computers depicted in 15 resolved when such order is required. The requirements 408 

FIGS. 2 and 3 are merely examples. Other computers, in fact also define the circumstances and constraints for the use of 

all current computers, also conform to this model. Examples application specific installation, maintenance, configuration, 

of other such computers include, but are not limited to, main uninstall, etc. code. 

frame computers, work stations, laptop computers, and The Data section 410 defines data tables, configuration 

game consoles, both portable and conventional game con- 20 files, and other persisted structures for the application. The 

soles that connect to a television. Examples of game con- data section 410 is used to hold UI preferences, routing 

soles include, for example, a Sony Playstation® and a tables, and other information that makes the application 

Nintendo Gameboy®. Playstation is a trademarked product more usable. 

of Sony Corporation of Tokyo, Japan Gameboy is a product ^ 4U m the form of the value 

and a remstered [trademark of Nintendo of America Inc. of 25 pmyiM by the applicatioiK Examples of Artifacts 412 

Redmond, Wash., a wholly owned subsidiary of Nintendo jadndc documenls> spreadsheets, databases, mail folders, 

Co., Ltd., of Kyoto, Japan. text fiks> m ^ wch pageS) etc m me mes mat 

The computer of FIG. 2 and the PDA of FIG. 3 are also contain the user's work ("user" in this context can be human 

suitable to implement the data structures, methods, and or application). 

apparatus of the present invention. The present invention 30 ^ ^ [{[d ^ 4U are the pelted structures created or 
modifies the computer architecture paradigm illustrated in modified within the runtime representation that are intended 
FIG. 1 in a way such as to present a new computer to satisfy the requirements for the application. A distinction 
architecture paradigm with features and advantages as is drawn between the requirements 408 (which may specify 
described below. ^ a raD g C Q f values in a registry setting or the form a directory 
The present invention provides a data structure, process, structure must conform to) and the actual solution con- 
system, and apparatus for allowing applications and com- structed in the runtime image 100 to satisfy these requi re- 
pute r systems to configure and manage themselves. The ments. 

present invention also provides a process, system, and The Requirements 408 are provided by the developers, 
apparatus for creating encapsulated applications with con- ^ application management software, environments, etc. Set- 
trolled separation from an application's runtime representa- ti ngs are ma de in an effort to resolve those requirements, and 
tion. This separation is critical and necessary. The applica- identify what requirements a setting is intended to satisfy, 
don's Encapsulation is provided by the Working Definition ^ refcrenoe now tQ ^ 4]} a ^ diagfam of a ^ 
tor the application, while the runtime image allows the model fof a ^ f architecture 499 

is depicted in accor- 

application to execute within traditional execution 45 dancc wi th a prc f err ed embodiment of me pre^nt invention. 

environments, as defined by Windows, Unix, OS/2, and ^ Qcw ^ 400 of ^ t 

other operating system platforms. buil(Js OQ ^ M de £ icted in nG j 

With reference now to FIG. 4A, a block diagram iUus- ]i oweve r, the new computer architecture 400 also includes 

traling a data structure for strongly encapsulating an appli- an application's working definition 402. In a preferred 

cation is depicted in accordance with the present invention. 50 embodiment, an application's Working definition 402 is an 

In a preferred embodiment, any software application may extensible markup language (XML) representation of the 

consist of the following elements: identity 404, code 406, identity 404, code 406, requirements 408, data 410, artifacts 

requirements 408, data 410, artifacts 412, and settings 414. 4U ^ 4 \4 ; tha t is it includes all of the defining 

These elements structure the application's defining charac- characteristics for the application. By defining these 

tenstics. The defining characteristics define the persisted 55 elements, separate from the runtime representation 100 of an 

state, settings, and structures required to build a valid application, working definitions 402 provides a way to 

runtime representation of the application within its targeted singly encapsulate an application which is compatible 

computer system. The application's working definition with operating systems that expect applicauons in an unen- 

documents and controls each of these elements of the capsulated form. Working definitions define a new, universal 

application. ^ pi cture 0 f a computer system. Working definitions do not 

The application's identity 404 is defined by the applica- interfere with any operating system or platform because they 

lion developer, and consists of the application's name, have no active behavior. Working definitions define the 

version, etc. The identity section 404 may also provide application's runtime representation 100 by defining what an 

documentation, web links, etc. for the application useful application is, what it requires, and what was done to give 

both to automated management services and users. 55 it what it needs, within a given computer system. While 

The code 406 represents the executable images, files, working definitions have no active behavior, they enable the 

source code, and any other representation of the executable implementation of a variety of automated services. These 
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services include application life cycle management, resource 
tracking, application installation, damage detection and 
repair, computer system optimization, etc. 

Strong encapsulation of state greatly reduces the global 
complexity of a computer system. States sucb as, for 
example, path requirements, file extensions, registry 
settings, program files, can be maintained in a pure state 
outside of the runtime representation 100. Because the 
runtime version of the application 100 is a redundant 
representation, it can be reconstructed, and fixed using the 
encapsulated version of the application as the standard. 
Important modifications can be persisted back to the appli- 
cation's Working Definition as a background task. 

Working definitions use XML to define a flexible format 
for structuring the critical information about an application, 
thus encapsulating the application. XML provides the ability 
to extend working definitions to fit future requirements and, 
XML, just like the hypertext markup language (HTML) of 
the world wide web, can be as simple or as complex as the 
job requires. The actual size of a working definition is very 
small compared to the application it defines, even if the 
application is just a batch file. At the same time, working 
definitions can be used to describe vast, distributed, multi- 
platform applications. 

Working definitions are platform and technology inde- 
pendent and address universal configuration issues. 
Furthermore, working definitions can be used as an open 
standard. Working definitions provide configuration infor- 
mation to services and to applications as well as providing 
bookkeeping support. 

An application's valid runtime representation is one that 
satisfies a finite, structured set of constraints. Therefore, 
XML is an ideal method because XML provides an ideal 
representation for structured data and supports the ability to 
define links that allow tracking of internal relationships or 
for including references to structures defined externally. 
Furthermore, XML is designed to be easily extended and is 
platform independent. 

However, XML is not a good representation for constant 
modification and access. The runtime representation for the 
application provides the application with an efficient execut- 
able representation. To provide an application with an effi- 
cient executable representation, the XML can detail the 
required runtime structure and requirements for the appli- 
cation. Using these instructions, the application can be 
constructed and executed automatically. Prerequisites, in the 
proper order and with the proper configuration will also be 
constructed and executed automatically. 

As the application executes, changes made to the appli- 
cation's state may be made as they arc presently, that is, to 
the runtime representation. The XML specifics the important 
files to be persisted, and this persistence may be done in the 
background. The overhead is thus very minimal with little 
effect on the performance as viewed from a user's perspec- 
tive. 

The nature of the problem of automating management and 
configuration of computer systems requires a common, 
computer friendly and people friendly, complete represen- 
tation of each software component. Without this 
information, the automated management of computer sys- 
tems is not possible. 

While the working definitions have been described herein 
and will be described henceforth with reference to an XML 
representation of the application, any number of other tech- 
nologies and approaches might be used as well as will be 
obvious to one of ordinary skilled in the art. For example, 
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encapsulation could also be implemented using databases or 
with Zip files. Also, a database could be used for the actual 
implementation of the support for the encapsulation with 
XML used as a transfer protocol. 

Furthermore, although described primarily with reference 
to constructing Working Definitions using XML, Working 
Definitions may be constructed using any structured format. 
Possible choices, for example, include the use of other 
structured forms. These include: 

Object based technologies (Java Beans, COR13A objects, 
C++ objects, etc.). These would have to be backed by 
some sort of database. 
Object Oriented databases. These are a mix of databases 

and Object based technologies. 
Functional technologies. One could build a PostScript- 
like or Lisp-like language (compiled or interpreted) that 
defines these structures. 
Rule based technologies. Since Working Definitions gen- 
erally resolve sets of constraints, this may well be the 
best way of implementing services that execution 
against the constraints that Working Definitions define. 
Working definitions could be constructed directly into 
Rule form, similar to forward chainning technologies 
sucb as ART, or backward chainning technologies sucb 
as Prolog. 

Tagged formats. XML is an example of a tagged format. 
However, there are other tagged formats that would 
work just as well. Tagged Image Format (TIFF), 
although generally used for defining images, may also 
be used to define Working Definitions since the use of 
custom tags is supported. Additionally, since TIFF is a 
binary format, it could hold the executable code and 
binary data. Other tagged formats include SGML, TeX, 
and LaTex. 

In addition to these examples of structured formats, there 
are certainly other formats and representations that would 
also work for constructing Working Definitions. Any tech- 
nology that can 1) represent structured data, 2) support links 
40 to outside sources, and 3) can be used across platforms could 
be used to define Working Definitions. 

With reference now to FIG. 5, a portion of XML code is 
shown, demonstrating one method for representing the 
requirements element of the working definition, in accor- 
dance with a preferred embodiment of the present invention. 
The portion 500 of XML depicted is only snippet of a 
representation of the requirements element of the working 
definition and is given merely as an example. Furthermore, 
this portion 500 of XML is overly simplified, but docs 
demonstrate how this information might be represented. 

Platform tag at 502 reflects the requirement of the appli- 
cation that the platform be an xx86 type of computer, such 
as, for example, an Intel Pentium® class computer, running 
a Windows 95™, Windows 98™, or Windows NT™ oper- 
ating system. Pentium is a registered trademark of Intel 
Corporation of Santa Clara, Calif. Windows 95™, Windows 
98™, and Windows NT™ are all either trademarks or 
registered trademarks of Microsoft Corporation of 
Redmond, Wash. 

Services tag at 504 indicates that the application requires 
the TCP/IP and file system services. Prerequisites tag at 506 
indicates that the application Adobe Acrobat® is a prereq- 
uisite (i.e., requirement) for this application. Adobe Acrobat 
is a registered trademark of Adobe Systems Incorporated of 
San Jose, Calif. 

The registry tag at 508 describes a registry entry. In this 
example, a Registry entry and a couple of keys are defined. 
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The Directory lag at 510 describes a couple of directories, can be avoided when applications use a notification facility 

the "program Root" (which is required, but not specified by in order to facilitate this process. However, this kind of 

name) and a subdirectory "programRoofTjclass. interaction with the application is not required. If no changes 

Section 512 describes adding the programKoot directory have been made, then the damage detection and repair 

to the path environment variable. The directory 5 facility 602 continues to monitor the computer system (step 

"programRoofTjclass is also described as necessary to be 702). 

added to the classpath environment variable. If changes have been made to some aspect of the runtime 

The <test>. . . <Aest>tags in Section 512 describe tests representation of an application as defined by that applica- 

that can be used to verify the setup of the enclosing section. tion's working definition, then the damage detection and 

This is one possible way of describing how to use applica- 10 repair facility compares the change against the constraints as 

tion specific code to maintain an application. defined by the set of working definitions in the computer 

With reference now to FIG. 6, a block diagram illustrating system (step 706) and determines whether the change crc- 

a method of automated damage detection and repair within atcs any conflict (step 708). If the change effects one or more 

a computing system is depicted in accordance with a pre- working definitions without conflict, it is recorded in the 

f erred embodiment of the present invention. Currently, in the 15 settings section for the effected applications) (step 710). 

prior art, each application is responsible for its own integrity Conflicts are resolved by restoring or adjusting the runtime 

within a computer system. Automated facilities for detecting representations effected. Changes to temporary files and 

conflicts and resolving them, detecting damaged files or settings (as defined by the working definitions of the appli- 

detecting missing registry entries are severely limited with cations that use them) do not cause conflicts, 
prior art systems. 20 In systems where security is very important, an optional 

Conflicts and modifications to the runtime environment test (step 711) for changes outside those defined as reason- 

100 cannot be avoided. They are the natural result of using able can be made. In this optional version, if the change does 

a computer system, since the use of a computer system not create any conflict, then the damage detection and repair 

naturally leads to installing new applications, application facility determines whether the change effects any working 

updates, production and management of application 25 definition defined software component (step 711). This is 

artifacts, use of temporary files, and other changes in the done to detect those changes outside the defined constraints 

system's configuration. of the system. If the change does effect a working definition 

Damage detection and repair facility 602 uses the require- software component, then the settings of the effected work- 

ments defined in the working definitions 604 of all the ing definitions are updated (step 712). If the change does not 

installed applications as a set of constraints. Whenever 30 effect any working definition defined software component, 

changes occur to the runtime environment 300 or to any of then any number of strategies may be used (step 713) to 

the encapsulated applications, damage detection and repair address this security concern, including reversing the 

facility 602 detects these changes and checks them against changes, logging the changes for inspection later, or report- 

the set of constraints defined by the working definitions of ing these changes to some monitoring facility. The damage 

the encapsulated applications installed on the computer 35 detection and repair facility then continues to monitor the 

system. The XML working definitions of each encapsulated system (step 702). 

application define the invariant portions of the application Thus, the working definitions of installed applications 

and the user's work which should be persisted as contrasted allow a computer system to be modeled as a set of appli- 

with temporary files that need not be persisted. An additional cations that impose a set of constraints on the runtime 

digitally signed section of the XML application working 40 representation of the computer system. These constraints 

definition can provide insurance that checksums and files define all of the elements of an application that can possibly 

sizes for each of the application's files in the runtime be persisted as described above. When settings change, the 

environment 100 have not been modified. If a file has been damage detection and repair facility can evaluate these 

modified, as detected by damage detection and repair facility changes against this set of constraints as defined by the 

602, that file can be repaired with a signed, known valid 45 working definitions of each application. If the constraints are 

version. still met, then the settings can be recorded in the settings 

The set of files subject to inspection is very limited, since section of the effected working definitions. If the constraints 

the working definitions define exactly which files should be are not met, the settings can be adjusted (and recorded) as 

checked. Instead of having to verify each and every file, the required to restore the computer system back to its proper 

search can be limited to only those files the working defi- 50 configuration. 

nitions identify as critical. More exhaustive checks can be It is important to note that while working definitions 

done, but the basic runtime representation verification does embody logically the totality of all of the defining charac- 

not require them. teristics of an application, the working definition may in fact 

With reference now to FIG. 7, a flowchart illustrating an be implemented as a distributed entity, rather than the single 

exemplary method of implementing a damage detection and 55 entity located on an individual machine as the present 

repair facility, such as, for example damage detection and invention has been primarily described. Components of it 

repair facility 602 in FIG. 6, is depicted in accordance with may be accessed via links and may be physically stored on 

the present invention. servers, hard drives, or other media and accessed over buses, 

The damage detection and repair facility monitors the the Internet, an intranet, or other channels, 
computer system for changes (step 702) and determines 60 It is important to note that while the present invention has 

whether the data processing system has received an indica- been described in the context of a fully functioning data 

tion to power down (step 703). If an indication to power processing system, those of ordinary skill in the art will 

down the data processing system has been received, then the appreciate that the processes of the present invention are 

process ends. If no indication to power down has been capable of being distributed in the form of a computer 

received, then the damage detection and repair facility 65 readable medium of instructions and a variety of forms and 

determines whether changes have been made to any files, that the present invention applies equally regardless of the 

settings or encapsulated applications (step 704). Searches particular type of signal bearing media actually used to carry 
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out the distribution. Examples of computer readable media responsive to a determination that the change does not 

include recordable -type media, such as a floppy disk, a hard effect any working definition defined software 

disk drive, a RAM, CD-ROMs, DVD-ROMs, and component, logging the change, 

transmission-type media, such as digital and analog com- 11. The method as recited in claim 1, further comprising: 

munications links, wired or wireless communications links 5 responsive to a determination that the change does not 

using transmission forms, such as, for example, radio fre- create a conflict with the set of constraints, determining 

quency and light wave transmissions. The computer read- whether the change effects any working definition 

able media may take the form of coded formats that are defined software component; and 

decoded for actual use in a particular data processing responsive to a determination that the change does not 

system. JQ effect any working definition defined software 

The description of the present invention has been pre- component, reporting the change to a monitoring facil- 

sented for purposes of illustration and description, and is not j t y 

intended to be exhaustive or limited to the invention in the 12. A computer program product in computer readable 
form disclosed. Many modifications and variations will be me dia for use in a data processing system for detecting and 
apparent to those of ordinary skill in the art. The embodi- repairing damaged portions of a computer system, the com- 
ment was chosen and described in order to best explain the 15 puter program product comprising: 

principles of the invention, the practical application, and to fi^ instructions for detecting a change to the computer 

enable others of ordinary skill in the art to understand the system; 

invention for various embodiments with various modifica- second instructions for comparing the change to working 

tions as are suited to the particular use contemplated. definitions for each application installed on the com- 

What is claimed is: 20 puter system; wherein the working definitions comprise 

1. A method of detecting and repairing damaged portions a set of constraints placed upon the computer system by 
of a computer system, the method comprising: each application installed on the computer system; and 

detecting a change to the computer system; third instructions, responsive to a determination that the 

comparing the change to working definitions for each change is in conflict with the set of constraints, for 

application installed on the computer system; wherein 25 modifying a persistent storage so as to resolve the 

the working definitions comprise a set of constraints conflict 

placed upon the computer system by each application r ^ ™ e computer P 10 ^ P"** 1 "* 35 M claim ^ 

installed on the computer system; and fu ^ er ™™V™m> 

responsive to a determination that the change is in conflict „ ^ responsive to a ^termination that the 

with the set of constraints, modifying a persistent 30 <*ange does not create a conflict with the set of 

" " ^ "7 ^ \^ " 7 & constraints, for updating settings in the working defi- 

ti^nntois^teari^ niUonsofeffe^teTapplications 

2. The method as recited m claim 1, further comprising: 14 program producl ^ rec ited in claim 12, 
responsive to a determination that the change does not wherein the working definition of each application is pro- 
create a conflict with the set of constraints, updating 35 vided by an extensible markup language representation, 
settings in the working definitions of effected applica- 15. The computer program product as recited in claim 12, 
tions. wherein the step of modifying the persistent storage com- 

3. The method as recited in claim 1, wherein the working prises repairing the runtime image of an application based 
definition of each application is provided by an extensible on an encapsulated representation of the application, 
markup language representation. ^ 16. The computer program product as recited in claim 12, 

4. The method as recited in claim 1, wherein the step of wherein the step of modifying the persistent storage corn- 
modifying the persistent storage comprises repairing the prises repairing a damaged file using a correct version of the 
runtime image of an application based on an encapsulated file from the working definition. 

representation of the application. 17. The computer program product as recited in claim 16, 

5. The method as recited in claim 1, wherein the step of 45 wherein the correct version of the file is retrieved from a 
modifying the persistent storage comprises repairing a dam- server. 

aged file using a correct version of the file from the working 18. The computer program product as recited in claim 17, 

definition. wherein the server is accessed via a network. 

6. The method as recited in claim 5, wherein the correct 19. Ihe computer program product as recited in claim 18, 
version of the file is retrieved from a server. 50 wherein the network is an Internet. 

7. The method as recited in claim 6, wherein the server is 20. The computer program product as recited in claim 12, 
accessed via a network. further comprising: 

8. The method as recited in claim 7, wherein the network fourth instructions, responsive to a determination that the 
is an Internet. change does not create a conflict with the set of 

9. The method as recited in claim 1, further comprising: 55 constraints, for determining whether the change effects 
responsive to a determination that the change does not any working definition defined software component; 

create a conflict with the set of constraints, determining and 

whether the change effects any working definition fifth instructions, responsive to a determination that the 

defined software component; and change docs not effect any working definition defined 

responsive to a determination that the change docs not 60 software component, for reversing the change, 

effect any working definition defined software 21. The computer program product as recited in claim 12, 

component, reversing the change. further comprising: 

10. The method as recited in claim 1, further comprising: fourth instructions, responsive to a determination that the 
responsive to a determination that the change does not change does not create a conflict with the set of 

create a conflict with the set of constraints, determining 65 constraints, for determining whether the change effects 

whether the change effects any working definition any working definition defined software component; 

defined software component; and and 
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fifth instructions, responsive to a determination that the 
change does not effect any working definition defined 
software component, for logging the change. 

22. The computer program product as recited in claim 12, 
further comprising: 

fourth instructions, responsive to a determination that the 
change does not create a conflict with the set of 
constraints, for determining whether the change effects 
any working definition defined software component; 
and 

fifth instructions, responsive to a determination that the 
change does not effect any working definition defined 
software component, for reporting the change to a 
monitoring facility. 

23. A system for detecting and repairing damaged por- 
tions of a computer system, the system comprising: 

means for detecting a change to the computer system; 

means for comparing the change to working definitions 
for each application installed on the computer system; 20 
wherein the working definitions comprise a set of 
constraints placed upon the computer system by each 
application installed on the computer system; and 

means, responsive to a determination that the change is in 
conflict with the set of constraints, for modifying a 25 
persistent storage so as to resolve the conflict. 

24. The system as recited in claim 23, further comprising: 
means, responsive to a determination that the change does 

not create a conflict with the set of constraints, for 
updating settings in the working definitions of effected 30 
applications. 

25. The system as recited in claim 23, wherein the 
working definition of each application is provided by an 
extensible markup language representation. 

26. The system as recited in claim 23, wherein the step of 35 
modifying the persistent storage comprises repairing the 
runtime image of an application based on an encapsulated 
representation of the application. 



27. The system as recited in claim 23, wherein the step of 
modifying the persistent storage comprises repairing a dam- 
aged file using a correct version of the file from the working 
definition. 

28. The system as recited in claim 27, wherein the correct 
version of the file is retrieved from a server. 

29. The system as recited in claim 28, wherein the server 
is accessed via a network. 

30. The system as recited in claim 29, wherein the 
network is an Internet. 

31. The system as recited in claim 23, further comprising: 
means, responsive to a determination that the change docs 

not create a conflict with the set of constraints, for 
determining whether the change effects any working 
definition defined software component; and 
means, responsive to a determination that the change does 
not effect any working definition defined software 
component, for reversing the change. 

32. The system as recited in claim 23, further comprising: 
means, responsive to a determination that the change does 

not create a conflict with the set of constraints, for 
determining whether the change effects any working 
definition defined software component; and 
means, responsive to a determination that the change does 
not effect any working definition defined software 
component, for logging the change. 

33. The system as recited in claim 23, further comprising: 
means, responsive to a determination that the change does 

not create a conflict with the set of constraints, deter- 
mining whether the change effects any working defi- 
nition defined software component; and 
means, responsive to a determination that the change does 
not effect any working definition defined software 
component, for reporting the change to a monitoring 
facility. 
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FIG. 4A 



405 Property Key - displayname 
Dtsplayname 

408 Property Key - displaytcon 
Display icon 

410 Property Key - version 

Comma separated list in form of Major.Minor.Rev.Buitd version information 

4i5 Property Key - jobbrtype 

Either application, runtime, or extension 
application - Can be added to the jobbi desktop 
runtime - Can be added to the list of runtimes 
extension - Can be added to the list of extensions 

420 Property Key - jobbilocationtype 

Either URL | file | jobbHookup-server 

URL - the jobbi archive is located at the URL specified in jobbilocation 
file - the jobbi archive is contained in a file name specified in jobbilocation 
jobbMookup-server - goes to the jobbi-server and looks up the location information 

425 Property Key - jobbilocation 

Either a URL, filename, prompt, or 7 
URL - URL pointing to jobbi archive 

filename - presents the user with a file chooser box specifying the name of the jobbi archive 
428 prompt - presents the user with a dialog that either lets the user put in a URL or filename 
7 - use of this syntax indicates that the jobbi archive is contained within the current archive 

430 Property Key - nativecode 
Either true or false 

true - jobbi archive contains native code 

false - jobbi archive does not contain native code 

435 Property Key - nativecodeplatform 

One of the strings of the jobbi supported platforms 
Empty tf nativecode is false 

440 Property Key - dependencies 

Comma separated list of UIFJs that this package depends on 
Empty rf there are no dependencies for this jobbi package 

445 Property Key - main 

Java class name of class containing the main() function. This field is only specified if 
jobbrtype = application 
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SYSTEM AND METHOD FOR IMPROVING (such as specific application programming interfaces, or 

THE MANAGEABILITY AND USABILITY OF "APIs") being deprecated between Java levels. This means 

A JAVA ENVIRONMENT that applets written in Java version 1.0.2, while they work in 

Java version 1.1, may not work when the browsers adopt the 
5 next version, Java 2. To continue using an applet written in 
BACKGROUND OF THE INVENTION an older Java version without changing the applet, an older 

1 rr- \a ~f i™ »• JVM level (and therefore an older browser) must be used. 

1. Field of the Invention luL ., . v . . . . f ■ i * 

While this approach solves the problem of running applets 

The present invention relates to a computer system, and ^ older Java versions> it typically does not enable 

deals more particularly with a method, system, and deployment of new applets within this browser, because 

computer-readable code for improving the manageability development tools typically cease supporting generation of 

and usability of a Java environment. code in the older levels. Furthermore, as defects in existing 

2. Description of the Related Art browser JVMs are identified, applet developers often create 
Java is a robust, portable object-oriented programming work arounds while waiting for JVM developers to fix the 

language developed by Sun Microsystems, Inc., and which « Problem. Once the fixes are applied, the work-arounds may 

is gaining wide acceptance for writing code for the Internet «?« de S ect ? m me a PP leU m adm*uo D obtaining the latest 

and World Wide Web. While compilers for most program- reIea r* °* a ^wser does not necessarily imply that it will 

, # j r provide the latest release of the JVM level, as the level of 

mmg languages generate code for a particular opcraUng ^ ^ a ^ to ^ ^ ^ 

environment, Java enables^tmg pro-ams uang a "write released JV M level by 6 to 8 monSs. Tnis may mean that 

once run anywhere paradigjn. ("Java and "Write Once a ^ under development, which will be created using a 

Run Anywhere are trademarks of Sun Microsystems, Inc.) development toolkit, are created using a newer JVM level 

Java attains its portability through use of a specially- than is available in the new browser, 

designed virtual machine (" VM"). This virtual machine is For applications, changes to the run-time environment are 

also referred to as a "Java Virtual Machine", or "JVM". The easier to deal with, as most Java applications ship bundled 

virtual machine enables isolating the details of the underly- 25 together with their own level of the Java runtime and those 

ing hardware from the compiler used to compile the Java that don't state the required level of the Java runtime, 

programming instructions. Those details are supplied by the However, shipping a runtime with the application means that 

implementation of the virtual machine, and include such multiple copies of the same JVM level may be installed on 

things as whether little Endian or big Endian format is used the client, leading to wasted storage space. When the appli- 

for storing compiled instructions, and the length of an 30 cation is not bundled with its runtime, on the other hand, the 

instruction once it is compiled. Because these machine- user is responsible for making sure that the correct JVM 

dependent details are not reflected in the compiled code, the level is installed and the application is set up to use that 

code can be transported to a different environment (a dif- level. Changing the runtime level so that a Java program can 

ferent hardware machine, a different operating system, etc.), run, and making sure that all system settings are appropriate 

and executed in that environment without requiring the code 35 for the new level, is a difficult task for an end user to perform 

to be changed or recompiled — hence the phrase "write once, in today's environment. One solution to this problem is to 

run anywhere". The compiled code, referred to as Java write Java programs so that they will run correctly across 

"bytecode", then runs on top of a JVM, where the JVM is multiple Java runtime levels. This, however, is a very 

tailored to that specific operating environment. As an difficult task for a developer, and is therefore not a viable 

example of this tailoring of the JVM, if the bytecode is 40 solution. 

created using little Endian format but is to run on a micro- A further issue in the run-time environment for applets is 

processor expecting big Endian, then the JVM would be differences in how browsers from different vendors imple- 

responsible for converting the instructions from the byte- ment a particular JVM level. The browsers most commonly 

code before passing them to the microprocessor. used today are Netscape Navigator and Internet Explorer. 

Programs written in Java take two forms: applications and 45 Because an applet developer typically has no way of p re- 
applets. Java applets are applications that are intended to be dieting which browser (or browsers) will be used to run bis 
downloaded to a user's machine with a Web page, and run application, good development practice calls for testing the 
within the Web browser that displays the Web page. Since applet with each potential browser. As will be readily 
Java was introduced in 1995, it has gone through a number apparent, the time spent testing an applet grows significantly 
of dramatic changes in a very short period of time. During 50 when it is tested for multiple browsers, multiple JVM levels 
this evolution, number of advantages and disadvantages of within each browser, etc. (as well as possibly testing imple- 
using applications versus applets have come to light mentations of the browsers on different operating system 

One of the areas of difference between applications and platforms). Sun Microsystems has attempted to address 

applets is in the Java runtime environment, as well as the inter-browser differences (which also provides a way of 

affect of changes thereto. (The runtime environment 55 making the latest run-time level available for applet 

includes the JVM, as well as a number of files and classes execution) by providing a Java Plug-In which allows applets 

that are required to run Java application or applets. to be executed using a run-time environment provided by 

Hereinafter, the terms "JVM" and "runtime environment" Sun, instead of the run-time provided by the browser. A JVM 

will be used interchangeably unless otherwise noted.) For level can be selected from among those supported by the 

applets, only a single level of the JVM exists in a given 60 plug- in. However, this approach requires a user to undcr- 

version of a browser. In order to upgrade the JVM level to stand which is the required JVM level and how to select it. 

keep pace with changes to the language, a new version of the In addition, the plug-in still provides a single level of a JVM 

browser must be installed. And, as new levels of browsers until the user manually selects a different level, and therefore 

are installed on client machines, developers must update and does not address the problems discussed above related to 

maintain the Java code, recompiling (and retesting) it to 65 differences between JVM levels. 

match me browser's JVM level. In addition, evolution of the For applications, differences in JVM implementations 

Java language has in some cases resulted in functionality manifest themselves differently. Typically, there is only one 
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vetsionofeachJVMlevelperoperatiDgsystempIatfoim.il (for "Java archive") file format is used to distribute and 

may be easier for a developer to predict which operating archive Java applet files. 

system his apphcations will run on than it isto predict which Accordingly, a need exists for a technique by which these 
browserw.Hbeusedforexecutingapplets. T^us, the testand sbortcomings ^ lbe Java environment can be over- 
support requirements are significantly simpler for apphca- 5 ^ ^ ^ of lels ^ ^ advantages 
Uons than for applets. Syncbron.zat.on between the JVM of ^ ^ mbin £ providillg M environ . 
level used m appucabon development and the JVM level ^ avoids tbe disadvan ,ages of each. The 
used for executmg the resulting application, as well as the t ^ ^fines , ^ ch to ^ |hese 

synchronization problems related to fixing errors, are less _ . -„ • \L „ tUnt * rt 

, / . A . . problems, which will result in programs that are easier to 

likely to present a problem, compared to tbe situation for 10 * ^ less ^ to ide 

applets that was discussed above. This is because both the 

development and runtime environment for applications are SUMMARY OF THE INVENTION 
likely to be provided by the same vendor. In addition, when 

it is desirable to run an application on an older JVM (for An object of the present invention is to provide a tech- 
example, due to changes such as function being deprecated, j 5 nique whereby shortcomings in the current Java environ- 
as previously discussed), this is less troublesome for an ment can be overcome. 

application than for an applet. The only requirement with the Another object of the present invention is to provide a 

application scenario is that the older JVM is still available. technique whereby the advantages of applets and the advan- 

Anotber significant difference between applications and tages of applications are combined, providing an environ- 

applets is their ease of use for end-users. Java-enabled 20 ment which then avoids the disadvantages of each, 

browsers make it very easy for a user to run Java applets, u jg another object of the present invention to provide a 

where the user is required to do nothing more for execution technique that enables dynamically switching among run- 

than pointing the browser at the applet and clicking on a time environments for Java programs, on a per-program 

button. The user needs to know very little about the Java basis 

language and applets and may not even realize that an » i, kyet aDOther obje^ of ^ pre ^ m mvention toprovide 

applet is being invoked. Therefore, users do not need to be ^ m , manner mat enables a ^ to ^ 

trained in how to run Java applets, saving tune and money. sfAA between different environrnenLs . 

Running a Java application (i.e. running a Java program A r t _ , 

outside a browser), on tbe other hand, is considerably more * ? uthBr r °[ the tendon to provide a 

complicated. A Java application can be run fiom a devel- 30 tobmqw for specifying the dependencies of a Java 

opment toolkit such as the JDK (Java Development Kit) application, including which run-time environment is 

product from Sun Microsystems; alternatively, the apphca- required. 

tion may be run using the "JRE" (Java Runtime Yet another object of the present invention to provide this 
Environment) product (hereinafter, "JRE"), also from Sun technique in a manner that enables tbe dependencies to be 
Microsystems. The JRE is a subset of the JDK, providing tbe 35 located automatically, and downloaded and installed, with- 
functionality which is required for application execution. oul requiring a static specification of location information. 
Programs are executed from the command line when using Other objects and advantages of the present invention will 
the JRE. Running an application in either the JDK or JRE be set forth in part in the description and in the drawings 
requires a fair amount of knowledge about the Java language which follow and, in part, will be obvious from the descrip- 
and its environment. For example, tbe linked library paths 40 tion or may be learned by practice of tbe invention, 
and classpath environment variable must be properly set, To achieve the foregoing objects, and in accordance with 
and may change for each different application program. A the purpose of the invention as broadly described herein, tbe 
number of dependencies may exist for running a particular present invention provides a method, system, and computer- 
application. For example, if the application makes use of a readable code for use in a computing environment capable 
Java extension such as the Swing user interface components, 45 of having a connection to a network, for improving the 
the Swing libraries must be available. If the code for the manageability and usability of a Java environment. This 
extensions is not already installed on a user's machine, it technique comprises: defining a plurality of properties for a 
may be difficult for an average user to locate the code and Java application, wherein the properties describe the 
then perform a proper installation (i.e. including setting all application, zero or more extensions required for executing 
the required variables to enable the class loader to find the 50 the application, and a run-time environment required for 
code at run-time). In addition, a user must understand how executing the application; and storing the defined properties 
to operate tbe JDK or JRE for program execution. While along with an identification of tbe application. This tech- 
Java developers and system administrators may readily nique may further comprise installing the application on a 
understand these types of information, it is not reasonable to client machine using the stored properties. Preferably, 
place this burden on the average end-user who simply wants 55 installing the application further comprises: installing one or 
to execute a program. more dependencies of the application, wherein the depen- 
Several problems related to differences between browser dencies comprise the required extensions and the required 
implementations have been discussed. Two additional prob- run-time environment; and installing a Java Archive file for 
lems are differences in support for security features, and the application on the client machine, and this installing 
differences in archive formats. Security features are used in 60 dependencies further comprises: parsing the properties to 
an applet by invoking the security APIs provided by the locate the dependencies; determining whether each of the 
browser. The primary browsers in use today have different dependencies are already installed on the client machine; 
security APIs. This forces an applet developer to write (and and retrieving and installing the located dependency when it 
test) security code that is different between the browsers, is not already installed. Optionally, tbe technique may 
increasing the cost of providing the applet code. While tbe 65 further comprise retrieving a latest version of the stored 
"CAB" (for "cabinet") file format is used for distributing properties for tbe application prior to operation of installing 
and archiving files for the Internet Explorer browser, "JAR" the one or more dependencies. The installing one or more 
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dependencies may further comprise dynamically retrieving a station 10 includes a microprocessor 12 and a bus 14 

location for use in said retrieving and installing. In one employed to connect and enable communication between 

aspect, the technique may further comprise creating a reg- the microprocessor 12 and the components of the worksta- 

istry file on the client machine corresponding to the prop- tion 10 in accordance with known techniques. The work- 

erties. In this aspect, the technique preferably further com- 5 station 10 typically includes a user interface adapter 16, 

prises: receiving a request to execute a selected application which connects the microprocessor 12 via the bus 14 to one 

on the client machine; constructing a proper run-time envi- or more interface devices, such as a keyboard 18, mouse 20, 

ronment for the selected application using a corresponding and/or other interface devices 22, which can be any user 

registry file; and starting execution of the selected applica- interface device, such as a touch sensitive screen, digitized 

tion in the constructed environment. The constructing may 10 entry pad, etc. The bus 14 also connects a display device 24, 

further comprise: reading the corresponding registry file to such as an LCD screen or monitor, to the microprocessor 12 

determine current dependencies of the application, wherein via a display adapter 26. The bus 14 also connects the 

the current dependencies comprise currently-required exten- microprocessor 12 to memory 28 and long-term storage 30 

sions and a current run-time environment for the application; which can include a hard drive, diskette drive, tape drive, 

ensuring that each of the current dependencies of the 15 etc. 

selected application is installed; setting appropriate environ- The workstation 10 may communicate with other com- 
ment variables for the current run-time environment; and puters or networks of computers, for example via a corn- 
setting appropriate environment variables for the currently- munications channel or modem 32. Alternatively, the work- 
required extensions. Optionally, the technique may further station 10 may communicate using a wireless interface at 32, 
comprise: updating the current run-time environment in the 20 sucn as a CDPD (cellular digital packet data) card. The 
registry file; and updating the currently-required extensions workstation 10 may be associated with such other computers 
in the registry file. In addition, the technique may further m a area network (LAN) or a wide area network 
comprise setting one or more parameters of the selected (WAN), or the workstation 10 can be a client in a client/ 
application using the corresponding registry file, and may server arrangement with another computer, etc. All of these 
provide for updating the parameters in the registry file. 25 configurations, as well as the appropriate communications 

The present invention will now be described with refer- hardware and software, are known in the art. 

ence to the following drawings, in which like reference nG. 2 illustrates a data processing network 40 in which 

numbers denote the same element throughout. the present invention may be practiced. The data processing 

BRIEF DESCRIPTION OF THE DRAWINGS 30 ^^^^S^S!S^S^ 

FIG. 1 is a block diagram of a computer workstation may include a plurality of individual workstations 10. 

environment in which the present invention may be prac- Additionally, as those skilled in the art will appreciate, one 

liced; or more LANs may be included (not shown), where a LAN 

FIG. 2 is a diagram of a networked computing environ- 35 mav ^P 1 "^ a Polity of intelligent workstations coupled 

ment in which the present invention may be practiced; t0 a ^ osi P rocessor - 

FIG. 3 depicts the technique with which the preferred StiU referring to FIG. 2, the networks 42 and 44 may also 

embodiment of the present invention associates properties mdude mainframe computers or servers, such as a gateway 

with a Java application, and stores those properties for later computer 46 or application server 47 (which may access a 

use . ^ data repository 48). A gateway computer 46 serves as a point 

, - . . r . . r . of entry into each network 44. The gateway 46 may be 

FIG. 4A defines the layout of the properties information r , . 4 t . ^ J ? e 

. . . ■ «• j inV. An j * . preferably coupled to another network 42 by means of a 

used by the present invention and FIG. 4B depicts an v eoam J u6o ^ Unk g^. ^ gateway 4« may also be 

example of using this layout for a particular application directly to one of n ^ woA ^ &m & ^ a 

program, communications link 506, 50c. The gateway computer 46 

FIGS. 5A and 5B show the logic used in the preferred may bc implemented utilizing an Enterprise Systems 

embodiment to locate and install dependencies for an appli- Architecture/370 available from IBM, or an Enterprise Sys- 

cation program, and the logic used in the preferred embodi- terns Architecture/390 computer, etc. Depending on the 

ment to install the Jobbi JAR file for an application program application, a midrange computer, such as an Application 

on a client's computer; ^ Systcm/400 (also known as an AS/400) may be employed. 

FIG. 6A defines the layout of the registry file used by the ("Enterprise Systems Architccture/37f/* is a trademark of 

present invention, and FIG. 6B depicts an example of using IBM; "Enterprise Systems Architecture/390", "Application 

this layout for a particular application program; System/400", and "AS/400" are registered trademarks of 

FIG. 7 depicts the logic invoked in the preferred embodi- IBM.) 

ment when an application program is launched on a client 55 The gateway computer 46 may also be coupled 49 to a 

computer, and storage device (such as data repository 48). Further, the 

FIG. 8 depicts the logic with which run- time information gateway 46 may be directly or indirectly coupled to one or 

may bc updated after an application program has been more workstations 10. 

installed. Those skilled in the art will appreciate that the gateway 

60 computer 46 may be located a great geographic distance 
from the network 42, and similarly, the workstations 10 may 
be located a substantial distance from the networks 42 and 

FIG. 1 illustrates a representative workstation hardware 44. For example, the network 42 may be located in 

environment in which the present invention may be prac- California, while the gateway 46 may be located in Texas, 

ticed. The environment of FIG. 1 comprises a representative 65 and one or more of the workstations 10 may be located in 

single user computer workstation 10, such as a personal New York. The workstations 10 may connect to the wireless 

computer, including related peripheral devices. The work- network 42 using the Transmission Control Protocol/ 
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Internet Protocol ("TCP/IP") over a number of alternative the client and server, although the HTTP (Hyper Text 
connection media, such as cellular phone, radio frequency Transfer Protocol) protocol running on TCP/IP is used 
networks, satellite networks, etc. The wireless network 42 herein as an example when discussing these message flows, 
preferably connects to the gateway 46 using a network The present invention addresses a number of shortcom- 
connection 50a such as TCP or UDP (User Datagram 5 ings in the Java environment, as will be discussed herein. 
Protocol) over IP, X.25, Frame Relay, ISDN (Integrated The present invention also enables applets to be run without 
Services Digital Network), PSTN (Public Switched Tele- use of a browser, as if the applet was an application (and 
phone Network), etc. The workstations 10 may alternatively therefore all executable programs will be referred to here- 
con nect directly to the gateway 46 using dial connections inafter as "applications"). In this manner, the advantages of 
50b or 50t. Further, the wireless network 42 and network 44 10 applets and the advantages of applications are combined. In 
may connect to one or more other networks (not shown), in particular, the disadvantages discussed earner related to 
an analogous manner to that depicted in FIG. 2. synchronizing applet code with the JVM level in a browser, 

Software programming code which embodies the present different security APIs to invoke per browser, and the need 

invention is typically accessed by the microprocessor 12 of to support multiple file archival formats are avoided by no 

the workstation 10 and server 47 from long-term storage l5 longer using the browser as an execution environment. The 

media 30 of some type, such as a CD-ROM drive or hard acronym Jobbi — which stands for "Java code Outside the 

drive. The software programming code may be embodied on Browser By IBM** — is used herein to refer to the imple- 

any of a variety of known media for use with a data mentation of the present invention. Each application in Jobbi 

processing system, such as a diskette, hard drive, or has a properties file associated with it. The information in the 

CD-ROM. The code may be distributed on such media, or 20 properties file is used to describe the requirements of an 

may be distributed to users from the memory or storage of application, much as an applet tag would describe an 

one computer system over a network of some type to other applet's requirements in the current art. Using this properties 

computer systems for use by users of such other systems. information, each application program can specify its 

Alternatively, the programming code may be embodied in dependencies — including the particular runtime environ- 

the memory 28, and accessed by the microprocessor 12 25 ment tnat *° e application should run on — as well as the 

using the bus 14. The techniques and methods for embody- environment settings that are required for running the appli- 

ing software programming code in memory, on physical cation. Multiple run-time environments (i.e. multiple ver- 

media, and/or distributing software code via networks are sions of a JVM or JRE) can exist on a client machine, where 

well known and will not be further discussed herein. a single (shareable) copy of each run-time is accessible to 

A user of the present invention may connect his computer 30 application programs which need it. A technique is 

to a server using a wireline connection, or a wireless defined herein for dynamically switching to the run-time 

connection. Wireline connections are those that use physical environment which is required for a particular application, 

media such as cables and telephone lines, whereas wireless using information in the properties file. This is accomplished 

connections use media such as satellite links, radio fre- ^ tt ^ e or 110 m P ul fr° m a human user. In this manner, 

quency waves, and infrared waves. Many connection tech- 35 older JVM levels are just as easily accessible for use at 

niques can be used with these various media, such as: using run-time as newer levels, freeing application developers 

the computer's modem to establish a connection over a fr° m mc need to update, recompile, and retest applications 

telephone line; using a LAN card such as Token Ring or just to keep up with the moving target of the JVM level 

Ethernet; using a cellular modem to establish a wireless within the most recently released browser or operating 

connection; etc. The user's computer may be any type of 40 system platform run-time environment, 

computer processor, including laptop, handheld or mobile The preferred embodiment of the present invention will 

computers; vehicle-mounted devices; desktop computers; now be discussed in more detail with reference to FIGS. 3 

mainframe computers; etc., having processing and commu- through 8. 

nication capabilities. The remote server, similarly, can be FIG. 3 illustrates the technique with which the preferred 

one of any number of different types of computer which 45 embodiment of the present invention associates properties 

have processing and communication capabilities. These with a Java application, and stores those properties for later 

techniques are well known in the art, and the hardware use. The format of the properties information defined by the 

devices and software which enable their use are readily present invention, and an example of using this information 

available. Hereinafter, the user's computer will be referred for a particular application, is shown in FIGS. 4A and 4B, 

to equivalently as a "workstation", "device**, or "computer", 50 respectively. 

and use of any of these terms or the term "server" refers to The properties definition process shown in FIG. 3 is a 
any of the types of computing devices described above. stand-alone process performed by an application developer, 
In the preferred embodiment, the present invention is and is preferably integrated into the normal application build 
implemented as a computer software program. Availability process which the developer uses during application devel- 
of a network connection is assumed, which must be operable 55 opment. Block 300 indicates that the developer performs 
at the time when the dynamic loading software on a user's this normal build process, which will use techniques that are 
workstation is invoked. In the preferred embodiment, the known in the art. The output of this build process is a Java 
present invention is implemented as one or more modules Archive file, also known as a "JAR" file, as indicated at 
(also referred to as code subroutines, or "objects" in object- Block 305. The present invention does not change the JAR 
oriented programming). The server to which the client 60 file content Block 310 shows the "Jobbi packager" of the 
computer connects may be functioning as a Web server, present invention being invoked. This packaging step is 
where that Web server provides services in response to illustrated in more detail at Blocks 320 and 325. At Block 
requests from a client connected through the Internet. 320, the developer specifies values for the applicable prop- 
Alternatively, the server may be in a corporate intranet or erties of the application, using his knowledge about the 
extranet of which the client's workstation is a component. 65 application's requirements. These properties include appli- 
The present invention operates independently of the com- cation dependencies, run-time requirements, etc., as will be 
munications protocol used to send messages or files between described in more detail below with reference to FIG. 4. A 
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□ew JAR file, designated a Jobbi JAR file, is created as a used to separate the property names from their value. This is 

result at Block 325. This Jobbi JAR file is then stored in an merely one separator that could be used.) Properties entry 

archive for later use, as depicted at Block 315 of the 408 specifies a display icon to be used for application files, 

mainline flow, in a manner similar to the way in which an example of which is illustrated at 460. The value of this 

existing JAR files are stored. As shown in FIG. 3, the Jobbi 5 entry will be a file name where an icon is stored as an image, 

JAR file that will be archived at Block 315 includes the JAR bitmap, etc. Within this property value, the "|** character is 

file created according to the prior art for the application (as used in the preferred embodiment to indicate the path 

indicated by element 330), and also the Jobbi properties separator, and has been used in the path specification 

information (indicated by element 335). From this archive, "images|hod.giP* of element 460. (This "f* symbol will be 

all information needed to operate the application in a Java 10 replaced on installation of the registry into the client 

environment is available. (Note that while FIG. 3 indicates machine, as further discussed below with reference to FIG. 

storing both the existing archive data 330 and the Jobbi 6.) 

information 335 together, this is merely one technique that The next property entry 410 is for the version of the 

may be used. Alternatively, these two types of information packaged item. In the preferred embodiment, this informa- 

can be separately stored without deviating from the present 15 tion will be specified as a comma-separated list comprising 

invention, provided that the Jobbi properties are associated the major, minor, revision, and build version numbers. This 

with the archived application information. This association version information identifies which particular version of an 

may be implemented, for example, by storing a pointer or application, run-time, or extension the properties informa- 

other reference in the JAR file, which identifies the associ- tion pertains to. For application XI of FIG. 4B, the version 

ated Jobbi properties file; or, such a pointer or reference may ^ syntax is shown at element 461. For run-time Yl, the version 

be stored in the Jobbi properties file, pointing back to the is shown at 482. Version syntax other than the comma- 

JAR file.) separated list of the preferred embodiment may be used 

In an alternative embodiment, the operations depicted in when appropriate, to match the syntax used in a particular 

Block 310, 320, and 325 could be separated in time from the installation. 

operation of the normal build process and JAR file creation, 25 Property entry 415 identifies the type of information being 

without deviating from the inventive concepts of the present described by this entry in the properties file. The type may 

invention. When this alternative approach is used, the JAR be application, runtime, or extension. As shown at 462 and 

file created at Block 305 would be stored as in the current 483 of FIG. 4B, the keywords "application" and "runtime** 

art. When the properties information is subsequently created have been used. Alternatively, other techniques may be used 

for the application, the stored JAR file for the application 30 to convey the type, such as assigning numeric values (such 

would be located, and the properties information would then as 0 through 2) to the packaged items, 

be associated or stored therewith, as described with refer- The location type of the packaged item is specified using 

ence to FIG. 3. the property entry at 420. This location type 420 is used 

FIG. 4A defines the properties information and file layout along with the location entry 425. As described in FIG. 4A, 
400 contemplated by the preferred embodiment of the 35 the location type may be "URL** (Uniform Resource 
present invention. The layout 400 will be explained with Locator), in which case the location entry specifies a net- 
reference to the example 450 of FIG. 4B, which illustrates work location from which the archived package item can be 
the properties information for a hypothetical application. retrieved. An example of using a URL is shown at elements 
The properties information defined in this layout provides a 484 (where the type is identified using the "URL" keyword) 
standardized means for packaging not only applications, but 40 and 485 (where an example URL is specified) of FIG. 4B. 
also Java run-times and Java extensions. Different types of The location type 420 may alternatively be "file**, in which 
information are pertinent to each of these types of packaged case the location 425 specifies information for a directory 
content. Ten different types of property values are shown in structure in which the file is located. An example of using a 
the layout, at elements 405, 408, 410, 415, 420, 425, 430, file location is shown at elements 463 and 464 of FIG. 4B. 
435, 440, and 445. (While these ten types of information are 45 In the example location shown at 464, the special character 
used for the preferred embodiment, it will be obvious to one **." has been used to indicate that the archived information 
of ordinary skill in the art that additional or different values is in the current JAR file. In that case, there is no need to 
may be used in a proper setting, without deviating from the specify a location value 425 in the properties file. (While the 
inventive concepts disclosed herein. In addition, other preferred embodiment uses the special character ".** to 
names may be used for the properties instead of those 50 indicate this situation, other techniques such as a special 
shown, and the order of entries may be changed from the keyword may alternatively be used.) When referring to 
order shown.) archived information within a JAR file, a unique identifier or 

Some of the information in the properties file describes "UH>** that uniquely identifies the information within the 

the application, run-time, or extension, and other inform a- JAR file may be used. The location type 420 may also be 

tion describes its dependencies. Each individual entry will 55 specified as "jobbi-lookup-server", which means that the 

now be discussed. The first element 405 is the display name location will be determined dynamically at run-time (as will 

to be used for this application. The display name may be be discussed below with reference to FIG. 5B). In this case, 

used, for example, in displaying the icon which the user will the location value 425 will preferably be left empty. The 

use to invoke the application. For a hypothetical application location 425 may be specified using the keyword "prompt*', 

"XI", as indicated at element 455 of FIG. 4B, the display 60 as indicated at 428. In this case, the user will be prompted 

name entry syntax is shown at 459. The name of the property to enter the location information. (In the preferred 

appears first, which in this case is "display name**. The value embodiment, the keyword for the property type will appear 

for the property is shown as "xl*s display name**. For a in the property file, followed by the separator syntax, 

subsequent entry 480 in the properties file, which pertains to without an associated value when the value is empty.) As 

a run-time named "Yl" (see comment entry 480), the 65 described above with reference to entry 415, shorter iden- 

display name 481 is illustrated as "yl's display name". tifiers such as numeric values may be used instead of the 

(Note that in the example of FIG. 4B, the equal sign has been keywords for entry 420. 
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Property entry 430 specifies whether the archived package application, extensions needed for the application, etc. If a 

item contains native code. This entry 430 is used in con- dependency is needed but its location is not known (for 

junction with entry 435, which specifics a string identifier of example, the value of property entry 420 is specified as 

the native code platform when the value of entry 430 is "jobbi-lookup-server"), then the unique identifier of the 

"true". Examples of using these entries are shown at ele- 5 dependency will be sent to a lookup server 545, as indicated 

merits 465 and 466, where the native code value is false, and at flow 543. This lookup server 545 will return the associated 

at elements 486 and 487, where the native code value is true. Jobbi JAR file location to the client machine, as indicated by 

Dependencies for this package item are specified using flow 544. In the preferred embodiment, this returned infor- 
property entry 440. In the preferred embodiment, the depen- mation is a URL specifying the location of the server where 
dency syntax uses a comma-separated list of UlDs for code 30 the archived information is stored. Now that the location of 
that must be installed for this package item to run on any the dependency is known, a recursive invocation of the 
single platform. If there are no dependencies, then the value dependency installation process is invoked using flow 541. 
for this property will preferably be left empty. Element 467 This technique of determining the location of a dependency 
shows an example of dependency information, where two dynamically enables the present invention to provide pack- 
identifiers "X2" and "Yl" are listed. As shown at elements 15 aging that is very flexible, as contrasted to the current art 
470 and 480, property information will also be provided for where such location information must be statically pre- 
these dependencies. Element 488 shows that the example specified. (Note that while a single Web server 510 is 
run-time Yl has no dependencies. By specifying the depen- depicted in FIG. 5A, this is for illustrative purposes. More 
dencies within the archived package, all Java-specific infer- than one server may be used, where the HTTP request 506 
mation needed to install and use the package is available. 20 specify the appropriate server using its URL.) 
For example, an application such as XI (see element 455 of The dependency checking process of Block 540 will be 
FIG. 4B) specifies the run-time environment "Yl" which it repeated for each specified dependency. The determination 
needs as a dependency at 467. of whether a dependency is already installed uses techniques 

Property entry 445 defines the final entry of the preferred which are known in the art. Once the dependencies have 

embodiment, which is the Java class name of the class 25 Deen foUy processed as described with reference to Blocks 

containing the main( ) function. This entry has an empty 535 and 540, control returns to the mainline processing in 

value unless the package item is an application. The value Block 530, where the Jobbi JAR file retrieved at 507 is 

listed for this entry will be, used at run-time to launch the installed into a Jobbi registry on the client machine. This 

application. As shown for application XI at element 468, the process is described in more detail in FIG. 5B. 

main function for this application is located in the class 30 The installation process used to install the Jobbi JAR file 

"com.ibm.Xl". for an application program on a client's computer begins at 

FIG. 5A shows the logic used in the preferred embodi- Block 570 of FIG. 5B, where the application's JAR file is 

ment to locate and install dependencies for an application extracted from the Jobbi JAR file (when the two have been 

program, and FIG. 5B shows the logic used in the preferred stored together, see element 315 of FIG. 3) and copied to the 

embodiment to install the Jobbi JAR file for an application 35 UID directory of the client machine. (The UID directory is 

program on a client's computer. As depicted at 500, the logic a directory created on the client's machine, having the same 

of the dependency installation operates on the client's com- name as the UID of the Jobbi archive whose contents are 

puter. This logic may be invoked in a stand-alone manner, to contained therein. This technique facilitates finding the 

ensure that the dependencies for an application are installed. Jobbi archive information when subsequently setting up the 

Alternatively, it may be invoked during execution of an 40 environment for the application. Alternatively, other dircc- 

application, as will be further discussed below with refer- tory naming approaches may be used, in which case the 

ence to Block 730 of FIG. 7. Dependency installation for a name of the directory used to store the archived information 

particular application is requested at Block 505. In the for an application must be stored as part of the registry file.) 

preferred embodiment, an HTTP request is sent 506 to a Web Block 575 then uses the Jobbi properties information from 

server 510, where that Web server 510 stores Jobbi JAR files 45 the Jobbi JAR file, and creates a registry file from this 

515, 516, 517, etc. which were created using the technique properties information. The format of the registry file is 

described above with reference to FIG. 3. This HTTP depicted in FIG. 6, and is discussed in detail below. The 

request will specify a unique identifier for the requested registry file information 580 is then stored at Block 585 in 

application. The associated Jobbi JAR file will be located by the registry directory for this client machine. The process of 

the Web server 510, and returned 507 to the client machine. 50 FIG. 5B then ends. 

Block 520 indicates that this file is received at the client FIG. 6A defines the layout 600 of the registry file used by 

machine, and is subsequently processed at Block 525. The the present invention, and FIG. 6B depicts an example 670 

processing of the Jobbi JAR file is explained in further detail of using this layout for a particular application program, 

in Blocks 535 and 540. The properties information (see FIG. Whereas the properties file contains information needed to 

4) is parsed at Block 535 to locate the dependencies for the 55 use an application on a number of platforms on which it may 

application requested at Block 505. When the dependency be installed, the registry provides tailored information about 

information is extracted, Block 540 checks to see whether using the application on this particular client machine. At 

the dependent item is installed. This checking process may run-time, the registry information will be used to construct 

have 3 outcomes, indicated at flows 541, 542, and 543. If the the proper environment for the application, as will be 

dependency (or dependencies) is/are already installed, then 60 discussed below with reference to FIG. 7. A number of 

processing continues at Block 530 as indicated by flow 542. registry entries are extracted directly from the properties file 

If there are dependencies, and the location of the depen- information during the processing of Block 575 of FIG. 5B, 

dency is known, then control returns to Block 505 as as will be described herein. (In a similar manner as described 

indicated by flow 541. As will be obvious to one of ordinary with reference to the property file layout 400 in FIG. 4A, the 

skill in the art, this is a recursive invocation of the depen- 65 registry file layout 600 depicts the preferred embodiment, 

dency installation process. This recursive invocation may be and may pertain to an application, a runtime, or extensions, 

used to retrieve the run-time needed for the requested This layout information may be changed or extended in a 
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proper environment, it may be reordered, and other key- value of 0. The value of this entry is deduced from the 

words may be used instead of those shown.) properties file entry 415. 

The registry entry 605 specifies a unique identifier of this Registry entry 645 specifies the currently-selected run- 
package, which can be any valid string. The identifier may time environment to be used for an application. This entry is 
be generated, e.g., by invoking the function of a random 5 omitted from the registry file for run-times and extensions, 
number generator, or by other techniques (including user An example run-time identifier is shown at 679, which 
input) which do not form part of the present invention. An corresponds to the final choice from the list of run-times in 
example identifier string is shown at element 671 of FIG. element 676. The value to be used for entry 645 will be 
6B. Note that the registry entries have been prefixed with the chosen from the values in entry 630. Preferably, a selection 
syntax "jobbi** in this example: this is for illustrative pur- 10 policy will be used for a particular implementation, such as 
poses only, and clearly indicates that these are entries in the choosing the final element from a list of choices, or choosing 
Jobbi registry. the first element, etc. Alternatively, a user may be prompted 

Registry entry 610 is used to specify the Java class to select from the list, 

containing the main( ) function. This information is The package entry 650 specifies the archive name for the 

extracted from entry 445 of the properties file, during the 15 package, identifying where it is stored on a server in the 

processing of Block 575 of FIG. 5B, provided that the network using a URL or where it is stored in a file system 

properties file is that of an application. An example of using using a file path. If the archived package has been expanded 

this entry 610 is shown at element 672 of FIG. 6B, where a and installed into the UID directory of the client's machine, 

particular class name is specified. Registry entry 615 is the then the value of this entry is left blank, as shown by element 

display name of this package, and can be any displayable 20 680. 

string value. This value is extracted from properties entry The extern entry 655 is a semi-colon separated list of the 

405, and is illustrated in the example 670 at element 673. unique identifiers on which this package is dependent, and 

Entry 620 specifies a display icon to be used for application ^ created from entry 440 of the properties file. When there 

files, an example of which is illustrated at 674. The value of are no dependencies, then this entry may be completely 

this entry will be a file name where an icon is stored as an omitted from the registry file, as has been done in the 

image, bitmap, etc. As indicated in the note at the bottom of example 670. 

FIG. 6A, the 1" character is used in the preferred embodi- The final entry in the preferred embodiment of the registry 

ment to indicate the path separator, and has been used in the file is the parameters entry 660 This is a list of parameters 

path^^c a Uon«jmages|hod.giT of element 674. This f ferabl rated ^ semicolons, which will be' 

symbol wu] be replaced on installation of the rqpstry into ^ to ^ main( ) j nvocation of the 

the client machine to use the «\ or -/ character, as wlication . Accordingly, this en try is not specified unless 

appropriate for the .client s operating system The value of me Ration. Typically, the parameter 

^entry620 K extractedfromelement408ofthepropert.es values wfl , ^ entered by , ^ »^ ^ a ^ plication 

e * 35 launch process, and thus no parameter values will be stored 

The relative directory or archive name which needs to be persistently in the registry file. However, it may be that one 

included in the classpath environment variable when this or more parameters has a somewhat constant or fixed value, 

package is used is specified using registry entry 625. The Irj that casCj( V alue(s) may be stored in the registry, 

value used for this entry is created automatically, and is the avoiding the need to prompt the user to enter the values at 

location in the file system of the client machine where the ^ run-time. The example 670 omits specification of parameter 

archived package is stored. An example is shown at 675. values. 

Note that the symbol has been used, indicating that the - , ■ . , - . , . . c , ... 

, 4 - .i . , - FIG- ' depicts the logic invoked in the preferred embodi- 

platform-specific symbol for concatenation is to be used to . . r rT . . r , , A 

r . ..f . . . . * . _ . . . ment when an application program is launched on a client 

replace this symbol on installation of the registry on the ^ . \ . „. . - M . 

i- ** i_* / i • j • t-i^ computer. The process begins at Block 700, when a user 

client s machine (as explained in FIG. 6A). 4 „ r 4 j * i i lL 

v r ' 45 requests to execute an application, and takes place on the 

Registry entry 630 specifies a list of unique identifiers of client ma chine (as noted at element 710). The manner in 

run-time environments in which thus package can be which the user requests the application does not form part of 

executed. This list may be initially constructed using infer- me present ^^0. Th e user may click on an icon 

mation from the properties file, where the dependencies for representing the application (such as the display icon 620 

the package are inspected to locate each runtime for the 50 identified in the registry file), select an application identifier 

package. In addition, user or developer input may be used (such ^ ^ Dame 61S> tom a or pullHdowD 

subsequently, to extend the values in this list. Element 676 ^ ^ application using timer-driven lists, etc. At 

shows that in this example application, any of three different Block ns me process of coveting tbe appropriate 

run-time environments (identified in the example as run-time environment for the application, and starting the 

"lOOOOOOlV^ 55 apphcationexecutmginto^^ 

execute the application. illustrated, the user is required to know little or nothing 

The relative working directory which needs to be set as an about how the run-time environment operates. Block 715 

environment variable is specified as the value of registry opens the registry file associated with the requested 

entry 635. Tbe special syntax is used in the example at application, which contains information as described with 

677, indicating the current directory is to be used. This vahie & reference to FIGS. 6A and 6B. Platform-specific logic will 

is preferably created by initializing it to the value u .", and be invoked at Blocks 720 and 725 to process the environ- 

providing a means (such as a configuration menu) with m ent data from the registry file. For example, the classdir 

which the value can subsequently be changed if needed. entry 625 will be appended to the classpath variable, and tbe 

The type of item described by the registry information is working directory will be set using information from entry 

specified using entry 640, and will be either application, 65 635. Block 730 then checks the dependency list 655, to 

extension, or run-time. Element 678 indicates that tbe ensure that all dependencies are installed. If not, then the 

example pertains to an application, using the application installation process of FIG. 5 will preferably be invoked. For 
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those dependencies which are installed, a recursive invoca- 
tion of the logic in FIG. 7 is performed as shown at 732, 
setting the appropriate information for using the dependen- 
cies. When all dependency information has been processed, 
control transfers to Block 735. The current run-time entry 
645 of the application's registry file is extracted, and the 
value is used to retrieve the registry file for that run-time. 
Block 740 then uses information from the run-time's 
registry, and uses the appropriate settings for environment 
variables (such as appending directories to the libpath and 
binpatb, etc.). 

At Block 745, a platform-dependent JNI (Java Native 
Interface) is invoked. As is known in the art, the JNI is a 
standard, virtual machine independent interface used to 
enable Java applications to call native libraries of code 
written in other languages such as C or C++. The appropriate 
environment variables and application parameters are 
passed on this invocation, enabling Block 750 to finalize the 
setting of environment variables and then start the system 
process with the application program executing within it. 
The process of FIG. 7 then ends, and the program executes 
normally. As has been demonstrated, the novel techniques of 
the present invention enable the proper run-time to be used 
for an application, which may include changing the run-time 
dynamically as each different application is selected for 25 
execution. 

The run-time environment for an application can be easily 
changed using the present invention, according to the logic 
depicted in FIG. 8. At Block 800, user input is entered from 
a graphical user interface (GUI), command line, etc., 30 
requesting to change information in the registry file. 
(Alternatively, means may be provided with which a systems 
administrator can force information updates on one or more 
client machines. For example, if a new run-time environ- 
ment is being downloaded throughout an organization, the 35 
systems administrator may update all client registry files to 
use this new run-time. This approach will be useful to further 
reduce the amount of run-time knowledge required for the 
end users. Means for downloading information from a 
network location to client machines arc known in the art, and 40 
will be used to invoke the logic of FIG. 8.) If the user request 
is to update the current run-time entry 645 in the registry, 
then Block 805 will accept the new run-time identifier from 
the user. Optionally, verification of this identifier may be 
performed. If the user request is to change persistently- 45 
stored application parameters 655, Block 810 will accept the 
new parameter values. Optionally, the parameter values may 
be verified by inspecting the applicable application to ensure 
that the number and type of parameter values is appropriate. 
If the user requests to change or add dependency informa- 50 
lion 650, then Block 815 will accept the new information. 
Optionally, the stored list of dependency identifiers may be 
presented to the user, along with means for identifying 
additions, deletions, and changes to this list. Once the user 
has entered the changed registry information, and any 55 
optional verifications have been performed, Block 820 
updates the stored registry information for this application. 
The next time this application is launched, the revised 
information will be used when constructing the execution 
environment according to FIG. 7. Thus, it can be seen that 
changing an application program so that it uses a different 
run-time environment is greatly simplified as contrasted to 
the current art. 

While the preferred embodiment of the present invention 
has been described, additional variations and modifications 
in that embodiment may occur to those skilled in the art once 
they learn of the basic inventive concepts. Therefore, it is 
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intended that the appended claims shall be construed to 
include both the preferred embodiment and all such varia- 
tions and modifications as fall within the spirit and scope of 
the invention. 
We claim: 

1. In a computing environment capable of having a 
connection to a network, computer readable code readable 
by a computer system in said environment and embodied on 
one or more computer-readable media, for installing a Java 
application on a client machine, comprising: 

a subprocess for automatically retrieving, responsive to an 
execution request on said client machine, a properties 
file for a Java application to be installed, wherein said 
properties file (1) describes said Java application, (2) 
specifies zero or more executable extensions which are 
required for executing said Java application, and (3) 
specifies a run-time environment which is required for 
executing said Java application; 
a subprocess for installing said Java application on said 

client machine using said properties file; and 
a subprocess for automatically installing, by said client 
machine, one or more dependencies of said Java 
application, wherein said dependencies comprise said 
required executable extensions and said required run- 
time environment, further comprising: 
a subprocess for parsing said properties file to locate 

said dependencies; and 
for each of said located dependencies which is not 
already installed on said client machine, a subpro- 
cess for automatically recursively (1) retrieving a 
properties file for said located dependency, (2) 
installing said located dependency, and (3) installing 
any dependencies identified when parsing said 
retrieved properties file of said located dependencies, 
provided said identified dependency is not already 
installed on said client machine. 

2. Computer readable code according to claim 1, further 
comprising: 

a subprocess for revising said properties file for said Java 
application, wherein said subprocess for automatically 
retrieving and said subprocess for automatically install- 
ing then use said revised properties file. 

3. Computer readable code according to claim 1, wherein 
said subprocess for automatically installing one or more 
dependencies further comprises a subprocess for dynami- 
cally determining, for at least one of said dependencies, a 
location from which said at least one dependency is to be 
installed. 

4. Computer readable code according to claim 1, further 
comprising: 

a subprocess for creating a registry file on said client 
machine, wherein said created registry file contains 
entries corresponding to said properties file, said entries 
being tailored to said client machine; and 

a subprocess for using said created registry to construct 
said run-time environment for said Java application on 
said client machine. 

5. Computer readable code according to claim 4, further 
comprising: 

a subprocess for receiving a request to execute a selected 
Java application on said client machine; 

a subprocess for constructing a proper run-time environ- 
ment for said selected Java application using its corre- 
sponding registry file; and 

a subprocess for starting execution of said selected Java 
application in said constructed environment. 
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6. Computer readable code according to claim 5, wherein 
said subproccss for constructing further comprises: 

a subproccss for reading said corresponding registry file 
to determine current dependencies of said Java 
application, wherein said current dependencies com- 
prise currently-required extensions and a current run- 
time environment for said Java application; 

a subprocess for ensuring that each of said current depen- 
dencies of said selected Java application is installed; 

a subprocess for setting appropriate environment vari- 
ables for said current run-time environment; and 

a subprocess for setting appropriate environment vari- 
ables for said currently-required extensions. 

7. Computer readable code according to claim 6, further 
comprising a subprocess for setting one or more parameters 
of said selected Java application using values specified in 
said corresponding registry file. 

8. Computer readable code according to claim 7, further 
comprising a subprocess for updating said parameters in said 
registry file. 

9. Computer readable code according to claim 4, further 
comprising: 

a subprocess for updating said current run-time environ- 
ment in said registry file; and 

a subprocess for updating said currently-required exten- 
sions in said registry file. 

10. A system for installing a Java application on a client 
machine in a computing environment capable of having a 
connection to a network, comprising: 

means for automatically retrieving, responsive to an 
execution request on said client machine, a properties 
file for a Java application to be installed, wherein said 
properties file (1) describes said Java application, (2) 
specifies zero or more executable extensions which are 
required for executing said Java application, and (3) 
specifies a run-time environment which is required for 
executing said Java application; 
means for installing said Java application on said client 

machine using said properties file; and 
means for automatically installing, by said client machine, 
one or more dependencies of said Java application, 
wherein said dependencies comprise said required 
executable extensions and said required run-time 
environment, further comprising: 
means for parsing said properties file to locate said 

dependencies; and 
for each of said located dependencies which is not 
already installed on said client machine, means for 
automatically recursively (1) retrieving a properties 
file for said located dependency, (2) installing said 
located dependency, and (3) installing any depen- 
dencies identified when parsing said retrieved prop- 
erties file of said located dependencies, provided said 
identified dependency is not already installed on said 
client machine. 

11. The system according to claim 10, further comprising: 
means for revising said properties file for said Java 

application, wherein said means for automatically 
retrieving and said means for automatically installing 
then use said revised properties file. 

12. The system according to claim 10, wherein said means 
for automatically installing one or more dependencies fur- 
ther comprises means for dynamically determining, for at 
least one of said dependencies, a location from which said 
at least one dependency is to be installed. 

13. The system according to claim 10, further comprising: 
means for creating a registry file on said client machine, 

wherein said created registry file contains entries cor- 
responding to said properties file, said entries being 
tailored to said client machine; and 
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means for using said created registry to construct said 
run-time environment for said Java application on said 
client machine. 

14. The system according to claim 13, further comprising: 
means for receiving a request to execute a selected Java 

application on said client machine; 

means for constructing a proper run-time environment for 
said selected Java application using its corresponding 
registry file; and 

means for starting execution of said selected Java appli- 
cation in said constructed environment. 

15. The system according to claim 14, wherein said means 
for constructing further comprises: 

means for reading said corresponding registry file to 
determine current dependencies of said Java 
application, wherein said current dependencies com- 
prise currently-required extensions and a current run- 
time environment for said Java application; 

means for ensuring that each of said current dependencies 
of said selected Java application is installed; 

means for setting appropriate environment variables for 
said current run-time environment; and 

means for setting appropriate environment variables for 
said currently-required extensions. 

16. The system according to claim 15, further comprising 
means for setting one or more parameters of said selected 
Java application using values specified in said corresponding 
registry file. 

17. The system according to claim 16, further comprising 
means for updating said parameters in said registry file. 

18. Hie system according to claim 13, further comprising: 
means for updating said current run-time environment in 

said registry file; and 
means for updating said currently-required extensions in 
said registry file. 

19. A method for installing a Java application on a client 
machine in a computing environment capable of having a 
connection to a network, comprising steps of: 

automatically retrieving, responsive to an execution 
request on said client machine, a properties file for a 
Java application to be installed, wherein said properties 
file (1) describes said Java application, (2) specifies 
zero or more executable extensions which are required 
for executing said Java application, and (3) specifies a 
run-time environment which is required for executing 
said Java application; 

installing said Java application on said client machine 
using said properties file; and 

automatically installing, by said client machine, one or 
more dependencies of said Java application, wherein 
said dependencies comprise said required executable 
extensions and said required run-time environment, 
further comprising steps of: 

parsing said properties file to locate said dependencies; 
and 

for each of said located dependencies which is not 
already installed on said client machine, automati- 
cally recursively (1) retrieving a properties file for 
said located dependency, (2) installing said located 
dependency, and (3) installing any dependencies 
identified when parsing said retrieved properties file 
of said located dependencies, provided said identi- 
fied dependency is not already installed on said client 
machine. 

20. The method according to claim 19, further comprising 
the step of: 

revising said properties file for said Java application, 
wherein said automatically retrieving step and said 
automatically installing step then use said revised prop- 
erties file. 
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21. The method according to claims 19, wherein said 
automatically installing one or more dependencies step 
further comprises the step of dynamically determining, for at 
least one of said dependencies, a location from which said 
at least one dependency is to be installed. 

22. The method according to claim 19, further comprising 
the steps of: 

creating a registry file on said client machine, wherein 
said created registry file contains entries corresponding 
to said properties file, said entries being tailored to said 
client machine; and 

using said created registry to construct said run-time 
environment for said Java application on said client 
machine. 

23. The method according to claim 19, comprising the 
steps of: 

receiving a request to execute a selected Java application 
on said client machine; 

constructing a proper run-time environment for said 
selected Java application using its corresponding reg- 
istry file; and 

starting execution of said selected Java application in said 
constructed environment. 

24. The method according to claim 23, wherein said 
constructing step further comprises the steps of: 

reading said corresponding registry file to determine 
current dependencies of said Java application, wherein 
said current dependencies comprise currently-required 
extensions and a current run-time environment for said 
Java application; 

ensuring that each of said current dependencies of said 
selected Java application is installed; 

setting appropriate environment variables for said current 
run-time environment; and 

setting appropriate environment variables for said 
currently-required extensions. 

25. The method according to claim 24, further comprising 40 
the step of setting one or more parameters of said selected 
Java application using values specified in said corresponding 
registry file. 

26. The method according to claim 25, further comprising 
the step of updating said parameters in said registry file. 

27. The method according to claim 22, further comprising 
the steps of: 

updating said current run-time environment in said reg- 
istry file; and 

updating said currently-required extensions in said regis- 
try file. 

28. The method according to claim 19, wherein said Java 
application is a Java applet. 

29. A method for improving manageability and usability 
of a Java environment in a computing environment, com- 
prising steps of: 

storing an identification of one or more dependencies of 
a Java application, wherein said dependencies comprise 
a run-time environment, other than a browser, which is 
required for executing said application and zero or 
more extensions required for executing said applica- 
tion; and 

installing said Java application, wherein said installing 
step further comprises the step of using said stored 
identification to automatically locate and install said 
dependencies of said Java application. 
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30. A method of enabling an applet to execute outside a 
browser, comprising steps of: 

storing information pertaining to execution of said applet, 
wherein said information comprises: (1) an identifica- 
tion of one or more permissible run-time environments 
in which said applet may be executed, other than said 
browser, and (2) an identification of zero or more 
executable extensions on which said applet is 
dependent, as well as a corresponding location from 
which each of said identified executable extensions 
may be installed; and 

installing said applet using said stored information. 

31. The method of enabling an applet to execute outside 
a browser according to claim 30, wherein said installing step 
further comprises steps of 

ensuring that a selected one of said permissible run-time 

environments is available; and 
ensuring that each of said identified executable extensions 

is installed, and installing, from the corresponding 

location, any of said identified executable extensions 

that are not already installed. 

32. The method of enabling an applet to execute outside 
a browser according to claim 31, further comprising the step 
of executing said installed applet using said selected one of 
said permissible run-time environments and said installed 
executable extensions. 

33. A method for executing a Java application without 
using a browser on a client machine in a computing envi- 
ronment capable of having a connection to a network, 
comprising steps of: 

requesting, by a user, execution of a selected Java appli- 
cation on said client machine; 
constructing, by said client machine responsive to said 
request, a run-time environment for said selected Java 
application using information retrieved from a registry 
file, wherein said registry file contains entries specify- 
ing values for properties of said selected Java 
application, said values of said entries being tailored to 
said client machine, further comprising steps of: 
setting environment data using environment data val- 
ues from said registry file; 
automatically installing one or more dependencies of 
said selected Java application using dependency data 
values from said registry file, further comprising 
steps of: 

for each of said dependencies which is not already 
installed on said client machine, automatically 
recursively (1) retrieving a properties file for said 
dependency, (2) installing said dependency, (3) 
constructing a run-time environment for said 
dependency using information retrieved from its 
registry file, and (4) installing any dependencies 
identified in said retrieved properties file of said 
dependencies, provided said identified depen- 
dency is not already installed on said client 
machine; 

automatically locating a run-time registry file for a 
current run-time specified in said registry file of said 
selected Java application; 
setting environment data using environment data val- 
ues from said located run-time registry file; and 
invoking a virtual-machine independent interface for 
calling native libraries of code in said run-time 
environment; and 
executing said selected Java application in said con- 
structed run-time environment. 



